計算權重值

第三章研究方法

3.2 計算權重值

1.

計算

Concept

的出現頻率：

在屬於類別k的Ontology

_k

中，我們計算每個Concept在各類別文件中出現的次數，計算出現與未出現的次數，再以TF * IDF [16]的方式來計算Ontology

k

內的各個Concept的重要性。

(3-1)

log

1 , ,

∑

×

=

c c i k

i k i

DF DF N

W

k：表示類別 k, k ∈

{1...C}, C 為類別總數

i：表示在屬於類別k的Ontology

k中的某個Concept i

W

i,k：表示在屬於類別k的Ontology

_k

中, Concept i的權重值

DF

i,k：表示在屬於類別k的文件中, Concept i出現的文件數

N：表示所有的訓練文件數

透過公式(3-1)可以計算出各類別的 Ontology 內的各個 Concept 之權重值；

除了計算各 Concept 的權重值之外，接下來，我們也要計算 Concept 之間的語意關聯強度，藉以提供分類時，增強相似度判斷的依據。

Training Data

圖 3-3 計算 Concept 出現頻率的流程

∑

= _C

c i k

i k

DF DF N

1 , ,

, log

W

自然語言前處理 Ontologyk

計算Ontology

k

中的Concept i出現在類別k與 所有類別的文件數N中的次數

2. 依照出現頻率給予 Concept 權重值

我們以 Ontology 架構來計算各個新聞類別中每個 Concept 出現過的文件數，透過公式 3-1 可以計算出每個 Concept 的重要性，因此，如圖 3-4，我們依照每個 Concept 的重要性將該類別內的分成十等份，依序給予每個 Concept 由 0.1 至 1 的權重值。

1 0.1 0.2 0.9 . . . . . .

Concept 10 Concept 1

Concept 11 Concept 26 Concept 29

Concept 12

Concept 15 Concept 2

Concept 14 Concept 13

Concept 5 Concept 21

Concept 23 Concept 24

Concept 25

Concept 22

Concept 20 Concept 16

Concept 18 Concept 19 Concept 9

Concept 4

Concept 8

Concept 17 Concept 6

Concept 28

Concept 7

Concept 30 Concept 27

圖 3-4 給予 Concept 權重值

3. 計算 Concept 之間的語意關聯強度：

在各個類別的 Ontology 中，部份 Concept 都會存在著一些關係，這些關係在文件分類上計算文件相似度時，便可以採用做為計算文件相似度的特徵；因此，我們將這些存在於 Concept 之間的關係做訓練，計算 Relation 二端所連接之 Concept 在該類別下共同出現的頻率，若是同時出現的次數越頻繁，表示他們便有更佳的語意關聯強度。

k j k

k j i k

DF DF

RW DF

, ,

),

&

( ,

,

* 2

= +

^(3-2)

i：表示Concept _i j：表示Concept

k：表示類別 k

RW

i,j,k：表示在屬於類別k的Ontologyk中的Concept i與Concept j之間關係的權重值

DF

_i,k ：表示在屬於類別k的文件中, Concept _i出現的文件數

DF

(i&,j),k ：表示在屬於類別k的文件中, Concept i與Concept

j

共同出現的文件數

利用公式(3-2)，便可以了解在屬於類別k的Ontology

_k

中，Concept_i與Concept_j共

Training Data

在類別k的文件中，計算Ontologyk內，Concept

i與Concept j共同出現過的文件數，以及分別

出現過的文件數

Ontologyk

自然語言前處理

k j k i

k j i k

DF DF

RW DF

, ,

),

&

( ,

,

* 2

= + 同出

圖 3-5 計算 Relation 重要性的流程

4.

依照出現頻率給予

Relation

權重值

新聞類別中每個 Relation 出現過的文件數，透

現的次數DF(i&j),k大於分別出現的文件數DF_i,k或DF_j,k，其語意關係強度便越 強，關係的權重值也就越高，反之，若Concept i或Concept j分別單獨出現的次數 遠多於DF(i&j),k，則表示Concept i與Concept j之間的語意關係強度較弱。

我們以 Ontology 架構來計算各個

過公式 3-2 可以計算出每條 Relation 的重要性，因此，如圖 3-6，我們依照每條 Relation 的重要性將該類別內的分成十等份，依序給予每個 Concept 由 0.1 至 1 的權重值。

1 0.1 0.2 0.9 . . . . . .

Relation 10

Relation 1

Relation 11

Relation 26 Relation 29

Relation 12

Relation 15 Relation 2

Relation 14 Relation 13

Relation 5 Relation 21

Relation 23 Relation 24

Relation 32

Relation 22

Relation 20 Relation 16

Relation 18 Relation 19 Relation 9

Relation 4 Relation 8

Relation 17

Relation 6

Relation 28 Relation 33

Relation 30 Relation 27

Relation 31

Relation 25

Relation 7

圖 3-6 給予 Relation 權重值

透過上述的訓練方式，我們可以將一個 Ontology k，轉換成一個含有權重 值的 weighted Ontology k

′

，如下圖 3-7 與圖 3-8，

…

Concept m Attributes m

Concept n Attributes n

Concept p Attributes p

Concept q Attributes q

Concept r Attributes r

Concept s Attributes s

Concept t Attributes t

…

… …

News Ontology

Concept 1 Attributes 1

Concept 2 Attributes 2

Concept 3 Attributes 3

Concept k Attributes k Domain Concept 0

Association Generalization

圖 3-7 訓練前之 Ontology

Weight 0

…

Concept m Attributes m

Weight m

Concept n Attributes n

Weight n

Concept p Attributes p

Weight p

Concept q Attributes q

Weight q

Concept r Attributes r

Weight r

Concept s Attributes s

Weight s

Concept t Attributes t

Weight t

…

… …

Weighted News Ontology

Concept 1 Attributes 1

Weight 1

Concept 2 Attributes 2

Weight 2

Concept 3 Attributes 3

Weight 3

Concept k Attributes k

Weight k

RW

_1,0

RW

_k,0

RW

_m,2

RW

_2,0

RW

_3,0

RW

_n,2

RW

_p,3

RW

_q,k

RWr,s

RW

_s,n

RW

_r,n

RW

_,r,m

RW

_p,2

Domain Concept 0

Association Generalization

圖 3-8 訓練後之帶有權重值之 Ontology

在文檔中基於 Ontology 架構之文件分類網路服務研究與建構 (頁 32-37)

第三章 研究方法

3.2 計算權重值

1.

Concept

k

k

log

∑

×

=

DF DF N

W

k：表示類別 k, k ∈

i：表示在屬於類別k的Ontology

W

k

DF

N：表示所有的訓練文件數

∑

DF DF N

W

k

2. 依照出現頻率給予 Concept 權重值

1

0.1 0.2 0.9 . . . . . .

3. 計算 Concept 之間的語意關聯強度：

DF DF

RW DF

, ,

),

&

( ,

,

* 2

= +

i：表示Concept i j：表示Concept

k：表示類別 k

RW

DF

DF

j

k

i與Concept j共同出現過的文件數，以及分別

DF DF

RW DF

, ,

),

&

( ,

,

4.

Relation

1

0.1 0.2 0.9 . . . . . .

′

…

Concept m Attributes m

Concept n Attributes n

Concept p Attributes p

Concept q Attributes q

Concept r Attributes r

Concept s Attributes s

Concept t Attributes t

…

… …

News Ontology

Concept 1 Attributes 1

Concept 2 Attributes 2

Concept 3 Attributes 3

Concept k Attributes k Domain Concept 0

Weight 0

…

Concept m Attributes m

Weight m

Concept n Attributes n

Weight n

Concept p Attributes p

Weight p

Concept q Attributes q

第三章研究方法

_k

_k

i：表示Concept _i j：表示Concept

_k