• 沒有找到結果。

Renewed attributes

N/A
N/A
Protected

Academic year: 2022

Share "Renewed attributes"

Copied!
50
0
0

加載中.... (立即查看全文)

全文

(1)

IScIDE 2012

Nanjing

(2)

Autoencoder for Polysemous Word

Wei-Chen Cheng Jiun-Wei Liou

Daw-Ran Liou

Cheng-Yuan Liou *

Dept. of Computer Sci and Information Eng National Taiwan University

(3)

Introduction & review

Generating a code for each word

Modeling word perception using the Elman network.     Cheng‐Yuan Liou, Jau‐Chi Huang,  Wen‐Chie Yang:     Neurocomputing 71 (2008)  3150– 3157

(4)

Generating attributes for each  word by linguistics experts

attribute 1 = water,  Bank  =       attribute 2 = earth,

…, 

attribute R = … ,

(5)

Cost and controversy for these manually assigned attributes.

Generating them Automatically !

(6)

Predicting next word’s attributes

9/28/2012 6

(7)

Figure: Illustration of Elman network.

9/28/2012 7

(8)

Renewed attributes

Updating network’s weights after 

presentation of each word to reduce  the prediction error.

Averaged prediction for each word is  used as the renewed attributes after  each training pass.

(9)

Generated attributes

Semantic categorization Indexing

Ranking

Stylish analysis

(10)

Categorization  of 

Shakespeare’s 36 plays

(11)

11

(12)

c: comedy r: romance h: history t: tragedy

Number denotes publication year

12

(13)

Indexing result without keywords Indexing result without keywords

http://red.csie.ntu.edu.tw/demo/literal/SAS.htm

Query Search result, Shakespeare plays she loves 

kiss

BENVOLIO: Tut, you saw her fair, none else being by 

herself poised with herself in either eye; but in that crystal  scales let there be weigh.d. Your lady.s love against some  other maid that I will show you shining at this feast, and  she shall scant show well that now shows best.

‐Romeo and Juliet

armies die in  blood

MARCUS AND RONICUS: Which of your hands hath not  defended Rome, and rear.d aloft the bloody battle‐axe,  writing destruction on the enemy.s castle? O, none of  both but are of high desert my hand hath been but idle; 

let it serve. To ransom my two nephews from their death; 

then have I kept it to a worthy end.

‐Titus Andronicus

13

(14)

Ranking Shakespeare’s  36 plays

Authorship

(15)

15

(16)

Stylish analysis

(17)

Table: ‘RSMD’ values of William Shakespeare’s plays

9/28/2012 17

(18)

Table: ‘RSMD’ values of William Shakespeare’s works

9/28/2012 18

(19)

Polysemous word

• Difficulty of concept

• Many‐to one is a function,

one to many isn’t a function.

(20)

Polysemous word

Building a meaning pool matrix, M, for  each word. 

M contains B meanings (B candidates)  in its column vectors.

(21)

B=2 for Polysemous word  ‘bank’

money      river

attribute 1,    attribute 1, Bank =  attribute 2,    attribute 2,

… ,    ….       , attribute R,    attribute R

(22)

Predicting next word’s meaning

The code of the best predicated meaning  in ‘M’ is used for the next input word.

(23)

Renewed attributes

Updating network’s weights after 

presentation of each word to reduce  the prediction error.

Averaged prediction for a specific 

meaning  of the next word is used as  the renewed attributes of that 

meaning  after each training pass.

(24)

Figure: Illustration of Elman network for multi‐code.

9/28/2012 24

(25)

Experiments

Dream of the Red Chamber 紅樓夢

Romance of the Three Kingdoms 三國演義

(26)

Red Chamber has more than 841  thousands of characters and uses 5069 different Chinese characters

Pick 246 words (<5%) with fq in {fq ≥ 300 and ≤ 1200}

(27)

Dream of Red Chamber 

246 words {fq ≥ 300 and ≤ 1200}

9/28/2012 27

(28)

Figure: Training errors using different pool sizes. Color vertical lines mark the minimum pass.

9/28/2012 28

(29)

Three Kingdoms has more than 570  thousands of characters and uses 

5071 Chinese characters.

Pick 258 words in 

{fq≥ 225 and ≤ 525}

(30)

Romance of the Three Kingdoms 258 words {fq≥ 225 and ≤ 525}

9/28/2012 30

(31)

Figure: Training errors using different pool sizes. Color vertical  lines mark the minimum pass.

9/28/2012 31

(32)

Table: Characters have multiple codes. The total number of   meanings of a character is labeled next to its character.

9/28/2012 32

(33)

Table: Sentences in Red Chamber which contain the same character with two different meanings, s=1 and s=4.

9/28/2012 33

(34)

Table: Sentences in Three Kingdoms which contain the same  character with two different meanings, s=2 and s=4.

9/28/2012 34

(35)

Table: Samples of two names having multiple codes.

9/28/2012 35

(36)

The number of meanings

9/28/2012 36

(37)

Examples

• 話說王夫人見中秋已過,鳳姐病已比先減了,雖未 大愈,然亦可出入行走得了,仍命大夫每日診脈服 藥,又開了丸藥方(1)子來,配調經養榮丸。 (意:

單子)

• 說著,便袖了這石,同那道人飄然而去,竟不知投 奔何方(2)何捨。 (意:方向)

• 自取了筆硯紙墨出來,將方(3)才的詩,命她二人念 著,遂從頭寫出來。(意:剛剛)

• 妙玉送至門外,看她們去遠,方(4)掩門進來。(意:

才) 

• 賈珍等拿了藥方(5)來,回明賈母原故,將藥方放在 桌上出去,不在話下。(意:帖)

9/28/2012 37

(38)

The number of meanings

9/28/2012 38

(39)

Examples

• 是非成(1)敗轉頭空:青山依舊在,幾度夕 陽紅。 (意:成功)

• 孔明曰:「曹操幼子曹植,字子建,下筆 成(4)文。操嘗命作一賦,名曰銅雀臺賦。

賦中之意,單道他家合為天子,誓取二 喬。」 (意:形成)

• 成(5)功不必添蛇足,討賊猶思奮虎威。

(意:勝利)

9/28/2012 39

(40)

Stylish analysis Authorship

(41)

Dream of Red Chamber

Prediction error along each word:

9/28/2012 Cheng‐Yuan Liou 41

(42)

Romance of the Three Kingdoms

Prediction error along each word:

9/28/2012 Cheng‐Yuan Liou 42

(43)

Summary

Context‐based method (Changing scenario)

Symbol–free sequence

Meaning of a learned attribute can be  calibrated by its similar words.

Predicating the next word (symbol) of a  given word sequence.

(44)

Applications

Stylish analysis

Authorship

Semantic indexing, 

ranking, and categorization

Internet

DNA, gene, or protein

Cryptography 

Ancient language, machine translation

(45)

SARS ‘‘AY274119.3’’ genome. 

‘white represents the largest error’

(46)

Influenza (1918)

ACCESSION: AF116575.1

(100)…GACACAGTACTCGAAAAGAATGTGACCGTGACACACTCTGTTAACCTGCTC…(150)

(100)…112121125353115155515555532555535353535552512355222…(150)

(500)…GGCTGACAAAGAAGGGAAGCTCATACCCAAAGCTTAGCAAGTCCTATGTGA…(550)

(500)…152211215111551551135352532351552251135112235155555…(550)

(1000)…GGACTAAGAAACATTCCATCTATTCAATCCAGGGGTCTATTTGGAGCCATT…(1050)

(1000)…155351555153525321535152215223551512225252155523525…(1050)

A: 1, 5 T: 2, 5 C: 2, 3 G: 1, 5

(47)

Influenza (2009)

ACCESSION: FJ966082.1

(100)…GACACAGTACTAGAAAAGAATGTAACAGTAACACACTCTGTTAACCTTCTA…(150)

(100)…134343153453133331335153343153343434545155334455453…(150)

(500)…GGCTAGTTAAAAAAGGAAATTCATACCCAAAGCTCAGCAAATCCTACATTA…(550)

(500)…114531553333331133355435344433314543143335445343553…(550)

(1000)…GGATTGAGGAATATCCCGTCTATTCAATCTAGAGGCCTATTTGGGGCCATT…(1050)

(1000)…113551311335354441545355433545313114453555111144355…(1050)

A: 1, 3 T: 3, 5 C: 4 G: 1

(48)

Detailed techniques and settings in  IScIDE 2012 paper.

(49)

Museum of Cao Xueqin

Born and grown in Nanjing

1715 or 1724 — 1763 or 1764

• 曹雪芹故居 江宁织造府 (大行宮)

(50)

Thanks

http://www.csie.ntu.edu.tw/~cyliou/

參考文獻

相關文件

Without using ruler, tearing/cutting of paper or drawing any line, use the square paper provided (Appendix A) to fold the figure with the same conditions as figure 8b, but the area

ESDA is used by schools to collect and manage self-evaluation data, including the administration of on-line Stakeholder Survey (SHS), assessing students’ affective and

 Gouraud Shading: Different vertex normal, interpolated ve rtex color on a fragment..  Phong Shading: Different vertex normal, interpolated vert ex normal on

Core vector machines: Fast SVM training on very large data sets. Using the Nystr¨ om method to speed up

Schools implementing small class teaching may have different sizes of grouping and different numbers of groups subject to the learning objectives and students’ needs.. The number

Schools implementing small class teaching may have different sizes of grouping and different numbers of groups subject to the learning objectives and students’ needs.. The number

If we would like to use both training and validation data to predict the unknown scores, we can record the number of iterations in Algorithm 2 when using the training/validation

The major testing circuit is for RF transceiver basic testing items, especially for LNA Noise Figure and LTE EVM test method implement on ATE.. The ATE testing is different from