• 沒有找到結果。

Automatically constructing multi-relationship fuzzy concept networks in fuzzy information retrieval systems

N/A
N/A
Protected

Academic year: 2021

Share "Automatically constructing multi-relationship fuzzy concept networks in fuzzy information retrieval systems"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

Automatically Constructing Multi-Relationship Fuzzy Concept Networks in Fuzzy Information Retrieval Systems

Yih-Jen Horng’, Shy-Ming Chen**, and Chia-Hoang Lee*

*Department of Computer and Information Science National Chiao Tung University, Hsinchu, Taiwan, R. 0. C.

**Department of Computer Science and Information Engineering National Taiwan University o f Science and Technology, Taipei, Taiwan, R. 0. C.

A b s f ~ c t - In this paper, an intelligent fuzzy information retrieval system with an automatically constructed knowledge base is presented. The knowledge base is represented by a multi-relationship fuzzy concept network that can depict the relationships and their associating relevance degrees between concepts clearly. Based on the multi-relationship fuzzy concept network architecture, the users of the fuzzy information retrieval system can submit a fuzzy contextual query that specifies the search context in the query formula The fuzzy information retrieval system retrieves documents whose contents are relevant to the user’s query by some required relationships beneath a specified search context The proposed fuzzy information retrieval method is more i n a g e n t and more flexible than the existing methods since it can automatically construct knowledge bases (i.e., multi-relationship fuzzy concept networks) and it provides contextual search capability that allows users to specify fuzzy contextual queries in a more flexiile manner.

I. INTRODUCTION

In [4], Lucarella et al. proposed a knowledge-based document retrieval technique based on the fuzzy set theory [8], where the knowledge base is represented by a fuzzy concept network. Through the inference based on the links in a fuzzy concept network, the implicit relationships between concepts can be derived. However, since the filzzy inference process must be performed every time when the users submit a query, the method presented in [4] is not efficient enough. In [ 11, Chen et al. used relevance matrices to model filzzy concept networks. By calculating the transitive closure of a relevance matrix, the implicit relevance degrees between concepts could be obtained. The fuzzy information retrieval systems could process the user’s query more efficiently.

However, the information retrieval methods presented in [ 11 and [4] all assumed that the link strengths between two concept nodes or between a concept node and a document node in a fuzzy concept network are specified by experts. This assumption may be impractical when the application domain contains a large amount of concepts and documents. In this case, the construction of the corresponding knowledge base requires huge human efforts.

Moreover, the information retrieval methods of [ I] and [4] used only context-independent relationships. That is, these methods assume that relationships between concepts are unchangeable in all cases. In fact, the relationship between two concepts may vary according to different contexts. For a more cooperative information retrieval system, the possible context-dependent relationships between concepts should also be discussed, and the proper relationship between concepts is adopted when the user specifies search contexts.

In this paper, we present a method to automatically conslruct multi-relationship fuzzy concept networks based on training documents.

The concept nodes in a constructed multi-relationship fuzzy concept network are originally related to each other by context-independent relationships. Then, the possible contatdependent relationships between concepts are derived by considering concepts’ positions in the multi-relationship fuzzy concept network We use four possible kinds of fuzzy relationships [3] to describe all possible context-independent

relationships and contatdependent relationships between concepts. The users of a fuzzy information retrieval system can submit a fuzzy contextual query in which the search context is involved. Documents are retrieved if they contain concepts that have a specified fuzzy relationship with the concepts contained in the user’s query when conceming the search context.

The proposed information retrieval method is more intelligent and more flexible than the existing methods since it can automatically construct the knowledge bases (i.e., multi-relationship fuzzy concept networks) and it evaluates the degree that a document satisfies the users’ queries by considering both context-independent relationships and context-dependent relationships between concepts involved in the documents and users’ queries when the search contexts are considered.

11. A METHOD TO AUTOMATICALLY CONSTRUCT MULTI-RELATIONSHIP FUZZY CONCEPT NETWORKS

In this section, we introduce the definitions of multi-relationship filzzy concept networks and present a method to automatically construct multi-relationship fuzzy concept networks. Based on the multi-relationship fuzzy concept network architecture, four possible relationships between concepts can be derived. The semantics of the possible fuzzy relationships between concepts are reviewed fiom [3]:

(1)Fuzzy positive association: It relates concepts with a fuzzy similar meaning in some contexts.

(2)Fuzzy negative association: It relates concepts with fuzzy complementary relationship, filzzy incompatible relationship or fuzzy antonymous relationship in some contexts.

(3)Fuzzy generalization: A concept is regarded as a fuzzy generalization of another concept if it includes that concept in an analytic or partitive sense,

(4) Fuzzy specialization: It is the inverse of the fuzzy generalization relationship.

The fuzzy relationships between concepts introduced above are defined formally as follows [3]:

Definition 2.2: Let C be a set of concepts in a multi-relationship fuzzy concept network. Then,

(1)Fuzzy positive association P is a fuzzy relation, P C x C 4 [0, 11, which is reflexive, symmetric, and max-*-transitive.

(2)Fuzzy negative association N is a fuzzy relation, A? C x C -, [0, 11, which is anti-reflexive, symmetric, and max-*-nontransitive.

(3)Fuzzy generalization G is a fuzzy relation, G: C x C + [0, 11, which is anti-reflexive, anti-symmetric, and max-*-transitive.

(4)Fuzzy specialization S is a fuzzy relation, S: C x C + [0, 11, which is anti-reflexive, anti-symmetric, and max-*-transitive.

In the following, we present the definition of the multi-relationship fuzzy concept networks.

Definition 2.2: A multi-relationship hzzy concept network is denoted as MRFCN (E, L), where E is a set of nodes, and where each node stands for a concept or a document; L is a set of directed edges between nodes. If e E L, then the directed edge e has the following two formats:

(l)ci ( < p , , P > , < ~ u . N > . < p c , G > , < p . ~ , ~ > ) , c j , means that the directed

0-7803-7293-X/01/$17.00 0 2001 EEE 606 2001 IEEE International Fuzzy Systems Conference

I

(2)

edge e connects from concept ci to concept cj with a four-tuple

(< p , , ~ >,< p N ,,,e ~ p , , ~ >,< p, , s ,), where yp indicating there is a “fuzzy positive association” relationship between concept ci and concept cjwith degree yp; pN indicating there is a ‘‘fuzzy negative association” relationship between concept ci and concept cj with degree y ~ ; p~ indicating concept ci is more general than concept cj with degree p ~ ; ys indicating concept ci is more special than concept cjwith degree ys; where yp E [0, 11, NE [0, I], y ~ € [0, 11, and ps E io, 11.

(2) ci A d j , means that document dj has concept ci with the degree of strength y, where y E [0, 13.

Example 2.1: Assume there is a multi-relationship filzzy concept network as shown in Fig. 1, where c,, c2, ..., c, are concepts, and d,, d2, d3 and d4 are documents.

Fig. 1. A multi-relationship fuzzy concept network From Fig. 1, we can see that concept c3 is more general than concept c~ with degree 0.8; concept c3 is more general than concept c4 with degree 0.7; concept c3 has both filzzy positive association relationship and fuzzy negative association relationship with concept c7 with degrees 0.7 and 0.2, respectively; document d2 has concept c,, concept c2, concept c5 with the degrees of strength 0.8, 1, and 0.9, respectively.

The multi-relationship fuzzy concept network is used as the knowledge base for document retrieval. The steps for constructing a multi-relationship fuzzy concept network automatically is shown in Fig. 2.

Exhact words from training documents

(-

i

(, Calculate the weights of words to wnceots >

( Calculate the relevance degrees between concepts and documents )

-Calculate the relationships and the relevance degrees between concepts )

i o a - G r i i pi i m y concept netwok )

Fig. 2. The steps of constructing a multi-relationship fuzzy concept network.

The formula for calculating the weight of a word to a document is according to the normalized TF x ZDF (i.e., Term Frequency multiply Inverse Document Frequency) weighting method [ 6 ] , [7].

The weight w-word-document(t,di) of word t to document d, is calculated by the following formula:

( O S + O S ~ ) l o g - N

k-1.2, L 4

w - word -documenr(t,d,) -

F . 2 . L

where tJ, is the fkequency of word t appearing in document di, df; is the number of documents containing word t, L is the number of words contained in document di, and N is the number of documents in the corpus. The larger the value of w-word-document(t,di), the more important the word t to document di. From formula (l), we can see that w-worc-document(t,di) is normalized and the value of w-word-document(t,d,) is between zero and one.

After the weights of words in documents are obtained, the weight w-word_concept(t, c) of word t to concept c can be calculated by the following formula:

:,w- word -document(t,d,)

w-word -concept(t,c) = 9 (2)

m

where m is the number of documents belonging to concept c. The weight of word t to concept c is normalized by the number of documents belonging to concept c since the number of documents belonging to each concept may be different in the corpus.

Each concept in the multi-relationship fuzzy concept network contains several documents. If most of the words contained in concept c are also contained in document di and the weights of the words contained in concept c are high to document di, then the weight of document di to concept c should be high. The weight w-document-concept(di, c ) of document di to concept c is calculated by the following formula:

2, w- word-documenl(s,d,) (3)

w - document- concepl(d,, c) - n

where n is the number of words contained in concept c, and k is the number of words contained in document di and concept c in common.

The concept ci can be represented by a vector shown as follows:

(4) where h is the number of words extracted by the word extractor fkom the training documents, and wji is the weight of wordj to concept ci.

In the following, we introduce the method for deciding the kind of filzzy relationships between concepts and their associating relevance degrees [5]. Concept ci is more general than concept cj when most of the words contained in concept cj are also contained in concept ci, but most of the words contained in concept cj are not contained in concept ci. However, if concept ci and concept cj contain almost the same words, then these two concepts are similar.

The degree of concept cj contained in concept ci (i.e., concept ci is more general than concept cj) is denoted as G(ci,cj) and is calculated by the following formula:

min(% 9 wy 1 (5) G ( c i , c j ) =

z t , w b

where w b is the weight of word k to concept ci, w e is the weight of word k to concept cj, and n is the number of words in the word space.

From formula (5), we can see that the value of G(ci,cj) is between zero and one. The larger the value of G(ci,cj) is, the more the concept ci is general than concept cj. Let a be a threshold value, where a E [0, I]. If there is a generalization relationship between concept ci and concept cj, then G(ci,cj) is larger than or equal to a and G(cj,ci) is less than a. If both G(ci,cj) and G(cj,ci) are larger than or equal to a, then there is a firzzv positive association relationships between concept ci and concept cj. The relevance degree P(ci,cj) between concept ci and concept cj is calculated by the following formula:

607

(3)

P ( c i , c j ) = P ( c j , c i ) = min(G(ci,cj),G(cj,ci)).

~

111. MODELING MULTI-RELATIONSHIP FUZZY CONCEPT NETWORKS Although the multi-relationship fuzzy concept network explicitly describes the links and their associating relevance degrees between concepts, some implicit links do not reveal in the multi-relationship fuzzy concept network. To obtain the implicit link between concepts, we adopt the method of [ 11 to perform fuzzy inference by modeling a multi-relationship fuzzy concept network by a relevance matrix.

The relevance matrix represents the relevance degrees between concepts in a multi-relationship fuzzy concept network.

Definition 3.2: A relevance matrix Vis a fuzzy matrix [2] shown as follows:

C I c2 ... c,

- v (Vli A v,,) v (VI, A V O ) " ' v (Vli A V J -

v (vzt A V i J v (VZi vi?) ." v ( h i A V d v ( V ~ A Y , ) v (v, A V O ) '.' v (v, A V , )

i - I . ...." i - I . ..., n i d . . .I

9 (7)

i - I . .... n i - I . .," <-I, ..a

,- 1, ..., " i-1. ..., n i.1, .,"

Cl UII U12 ... Ul"

c 2 U21 U22 ... U2n

U , = . .. . . . . .. . . . . , , .

c n %I urn2 .'. unn

where m is the number of documents, n is the number of concepts, tii indicates the relevance degree between document di and concept cj, tv E [0, 11, 1 5 i s rn, and 1 5 j 5 n.

In a multi-relationship fuzzy concept network, the concepts may

be originally linked by only two context-independent link relationships. However, only these two kinds of relationships are not enough to clearly describe all kinds of relationships between concepts in the real world. That is, the pair of concepts may have context-dependent relationships. In this paper, the "context" means a specific concept node in a multi-relationship fuzzy concept network whose descendant concept nodes may have positive association or negative association relationships. The rules for deciding the fuzzy relationship and its relevance degree between concept c, and concept ci (when concept c k is selected as the context) are shown as follows:

l i k e 1:

Rule 2:

Rule 3:

Rule 4:

- .-

If concept ci is neither an ancestor nor a descendant of concept cj and concept ci and concept cj have a nearest common ancestor ch that is a descendant of concept ck, then concept ci and concept cj have a fuzzy positive association relationship where the relevance degree is min(v*(ch, ci), If concept c, is neither an ancestor nor a descendant of concept cj and there is no fuzzy positive relationship link between concept c, and concept cj and the nearest common ancestor of concept ci and concept cj is concept ck, then concept c, and concept cj have a fuzzy negative association relationship where the relevance degree is min(v'(ck, ci), If concept ci is an ancestor of concept cj, then concept ci is a fuzzy generalization of concept cj where the relevance degree is v ' ( c , cj).

If concept ci is a descendant of concept cj, then concept ci is a fuzzy specialization of concept cj, where the relevance degree is v*(cj, ci).

v*(ch, cj)).

v * ( ~ b cj)).

608

(4)

concepts are adopted by considering the search contexts. In the following, we propose a method for processing fuzzy contextual queries.

The users’ hzzy contextual queries have the following format:

Q = {CT, ( c l , r d , ( C m r X 2 ) , . . ., ( c n , r n ~ J } ,

where cT is the context concept, c, is the concept of the multi-relationship fuzzy concept network, r, E { P , N , G, S } is the desired fuzzy relationship of concept c, to the concepts contained in documents, x, E [0, 11 indicates the desired relevance degree of the document with respect to concept c,, 1 s i s n, and n is the number of concepts of the multi-relationship hzzy concept network. In a user’s query Q, if x, = 0, then it indicates that documents desired by the user don’t possess concept c,. If the user considers that some certain concepts may be neglected (i.e., to include those concepts or not would have no substantial effects on the result) then the user does not have to assign fuzzy relationships nor degrees of strength with respect to such concepts in the query, where the symbol “-I‘ is used for labeling a neglected concept.

The user’s query Q can be represented by a query descriptor vector q ,that is:

-

-

q = -1, x2, ..., X”>,

-

The query descriptor vector q is then expanded to the expanded query descriptor vector 4’ which adds more related concepts into the query formula and by which the information retrieval system can retrieve more relevant documents. The query-vector expansion algorithm is shown as follows:

Query-Vector ExDansion Algorithm:

for i = 1 to n do if r, = P then

-

f o r j = l t o n d o

if the nearest common ancestor of concept c, and concept c, is a descendant of concept c, and Up (ct, c,) > 0 then

if x j * ” - ”

if x j = I 1 - if ri = N then

f o r j = 1 t o n d o

if the nearest common ancestor of concept c, and concept c,

is concept c, and UN (ci, cj) > 0 then

if ri = G then f o r j = l t o n d o

if U, (ci, cj) > 0 then

max(xj,u,(ci,cj)) if x j * ” - ”

4; = if x j = - I’

if r, = S then f o r j = 1 t o n d o

if U, (c,, c,) > 0 then

- max(x,,U,(c,,c,)) if xJ * ” - ”

US ( C , ? C I 1 if J = -

4; = {

Let x and y be two values, where x E [0, 11, and y E [0, 11. The degree of similarity between x and y can be evaluated by the hnction W [ 11:

where W(x, y ) E [0, 11. The larger the value of W(x, y ) , the more the similarity between x and y .

W(x, y ) = 1 - Ir -A, (8)

Assume that the document descriptor relevance vector 0, (i.e., the ith row of the document descriptor matrix Tj and the expanded query descriptor vector 4’ are represented as follows:

-

- Di = +ill si^, ..., Sin’,

-

q’ = ( X I , x2, ..., X”>,

where so E [0, 11, xi E [0, 11, 1 s j 5 n, 1 s i s m, n is the number of concepts, and m is the number of documents. Then, based on [l], the degree of satisfaction DS(di) that document di satisfies the user’s query Q can be evaluated as follows:

__ 2 W ( S i j , X j )

(9) where DS(di) E [0,13,1 s i s n, and k is the number of concepts not neglected by the user’s query. The larger the value of DS(di), the more the degree of satisfaction that the document di satisfies the user’s query.

IV. CONCLUSIONS

In this paper, we have presented a method to automatically construct multi-relationship fuzzy concept networks for fuzzy information retrieval systems, where the multi-relationship fuzzy concept networks are used as the knowledge bases for fuzzy information retrieval. The relevance matrices are used to model multi-relationship fuzzy concept networks. The implicit relevance degrees between concepts can be obtained by calculating the transitive closure of the relevance matrices. Based on the multi-relationship fuzzy concept network architecture, four possible context-independent and context-dependent fuzzy relationships between concepts can be derived. We then use four kinds of relational relevance matrices to represent the relevance degrees between concepts for these four kinds of fuzzy relationships. The proposed k z y information retrieval method is more intelligent and more flexible than the existing methods since it can automatically construct knowledge bases (i.e., multi-relationship fuzzy concept networks) and it provides contextual search capability that allows users to specify fuzzy contextual queries in a more flexible manner.

Ds ( d i ) = q’o)c”-”and j = l , .... n

k

REFERENCES

S. M. Chen and J. Y. Wang, “Document retrieval using knowledge-based fuzzy information retrieval techniques,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 25, no. 6, pp. 793-803, May 1995.

A. Kandel, Fuzzy Mathematical Techniques with Applications.

CA: Addison-Wesley, 1986.

M. Kracker, “A hzzy concept network model and its applications,” in Proceedings of the First IEEE International Conference on Fuzzy Systems, U S A . , March 1992, pp.

D. Lucarella and R. Morara, “FIRST: Fuzzy information retrieval system,” Journal oflnformation Science, vol. 17, no.

G. Salton, The S M R T retrieval system: experiments in automatic document processing. New Jersey: Prentice Hall, 1971.

G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Pmcessing and Management, vol. 24, no. 5, pp. 513-523,1988.

G. Salton and M. J. Mcgill, Infmduction to Modern Information Retrieval. New York McGraw-Hill, 1983.

L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, pp.

761 -768.

1, pp. 81-91,1991.

338-353,1965.

609

數據

Fig. 2. The steps of constructing a multi-relationship fuzzy  concept network.

參考文獻

相關文件

Secondly then propose a Fuzzy ISM method to taking account the Fuzzy linguistic consideration to fit in with real complicated situation, and then compare difference of the order of

In order to improve the aforementioned problems, this research proposes a conceptual cost estimation method that integrates a neuro-fuzzy system with the Principal Items

(2007), “Selecting Knowledge Management Strategies by Using the Analytic Network Process,” Expert Systems with Applications, Vol. (2004), “A Practical Approach to Fuzzy Utilities

To make a study of the challenge of special horizontal SCM and uncertain fuzzy partner relationship they’re facing, analyze the relative factors of supply chain partner affect

Kuo, R.J., Chen, C.H., Hwang, Y.C., 2001, “An intelligent stock trading decision support system through integration of genetic algorithm based fuzzy neural network and

The scenarios fuzzy inference system is developed for effectively manage all the low-level sensors information and inductive high-level context scenarios based

Then, these proposed control systems(fuzzy control and fuzzy sliding-mode control) are implemented on an Altera Cyclone III EP3C16 FPGA device.. Finally, the experimental results

Generally, the declared traffic parameters are peak bit rate ( PBR), mean bit rate (MBR), and peak bit rate duration (PBRD), but the fuzzy logic based CAC we proposed only need