Fuzzy query processing for document retrieval based on extended fuzzy concept networks

(1)

processing for document retrieval based on extended fuzzy concept networks. In an extended fuzzy concept network, there are four kinds of fuzzy relationships between concepts, i.e., fuzzy positive association, fuzzy negative association, fuzzy generalization, and fuzzy specialization. An extended fuzzy concept network can be modeled by a relation matrix and a relevance matrix, where the elements in a relation matrix represent the fuzzy relationships between concepts, and the elements in a relevance matrix indicate the degrees of relevance between concepts. The implicit fuzzy relationships between concepts can be inferred by the transitive closure of the relation matrix. The implicit degrees of relevance between concepts also can be inferred by the transitive closure of the relevance matrix. The proposed method is more flexible than the ones presented in [8] and [17] due to the fact that it allows the users to perform positive queries, negative queries, generalization queries, and specialization queries. The proposed method allows the users to perform fuzzy queries in a more flexible and more intelligent manner.

Index Terms— Document retrieval, extended fuzzy concept networks, fuzzy query processing, relation matrix, relevance matrix.

I. INTRODUCTION

In [24], Salton et al. pointed out that an information retrieval system is a system which is used to store items of information that need to be processed, searched, retrieved, and disseminated to various user populations. The primary purpose of establishing an information retrieval system is to assist the users to efficiently ac-quire information [8]. Most commercial information retrieval systems currently still adopt the Boolean logic model. These information retrieval systems are based on the assumption that documents can be precisely described by sets of index terms and that information needed by the users can be represented by Boolean search requests. However, the information retrieval systems based on the Boolean logic model are rather restricted in applications due to the fact that these systems are unable to represent uncertain information. If there is uncertain information, the query processing of these systems is not handled properly [8]. In recent years, several fuzzy information retrieval methods based on fuzzy set theory [27] have been proposed for improving the disadvantage of the Boolean logic model, such as [8], [9], [12], [17]–[21], [25], and [28].

In [8], we presented a knowledge-based fuzzy information retrieval method to deal with document retrieval, where concept matrices are used for knowledge representation, and simple queries, weighted queries, interval queries, and weighted-interval queries are allowed for document retrieval. In [9], Ke et al. presented a fuzzy information retrieval system model for document retrieval. In [12], Kamel et Manuscript received January 6, 1997; revised August 31, 1997. This work was supported in part by the National Science Council, Republic of China, under Grant NSC 86-2213-E-009-018.

S.-M. Chen is with the Department of Electronic Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C.

Y.-J. Horng is with the Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan, R.O.C.

Publisher Item Identifier S 1083-4419(99)00907-3.

theory of fuzzy sets. In [21], Radechi presented a fuzzy set theoretical approach to document retrieval. In [25], Tahuni presented a fuzzy model of document retrieval system. In [28], Zemankova presented a fuzzy intelligent information system FIIS. However, either efficiency or effectiveness of these methods are not satisfied. Thus, there is an increasing demand to develop a more powerful fuzzy information retrieval method to deal with document retrieval.

In [8], we have presented a method to deal with document retrieval based on concept networks [17], where concept matrices are used for modeling concept networks. The method presented in [8] is more flexible than the ones presented in [9] and [17] due to the fact that it has the capability to deal with interval queries and weighted-interval queries. However, there is only one kind of fuzzy relationship between concepts in the concept networks presented in [8] and [17], i.e., fuzzy positive association relation. If we can provide more kinds of fuzzy relationships between concepts in a concept network, then there is room for more flexibility. In [14], Kracker has presented a fuzzy concept network model which has four kinds of fuzzy relationships between concepts, (i.e., fuzzy positive association, fuzzy negative association, fuzzy generalization, and. fuzzy specialization) for supporting database queries. In this paper, we generalize the definitions of fuzzy concept networks presented in [8], [11], and [17] to propose the concept of extended fuzzy concept networks based on [14]. We also present a new method for document retrieval based on the extended fuzzy concept networks. In an extended fuzzy concept network, there are four kinds of fuzzy relationships between concepts, i.e., fuzzy positive association, fuzzy negative association, fuzzy generalization, and fuzzy specialization. An extended fuzzy concept network can be modeled by a relation matrix and a relevance matrix, where the elements in a relation matrix represent the fuzzy relationships between concepts, and the elements in a relevance matrix indicate the degrees of relevance between concepts. The implicit fuzzy relationships between concepts can be inferred by the transitive closure of the relation matrix. The implicit degrees of relevance between concepts also can be inferred by the transitive closure of the relevance matrix. The proposed method is more flexible than the ones presented in [8] and [17] due to the fact that it allows the users to perform positive queries, negative queries, generalization queries, and specialization queries. The proposed method allows the users to perform fuzzy queries in a more flexible and more intelligent manner.

The rest of this paper is organized as follows. In Section II, we briefly review the definitions of concept networks from [8] and [17]. In Section III, we present the definitions of extended fuzzy concept networks. In Section IV, we use relation matrices and relevance matrices to model extended fuzzy concept networks. In Section V, we propose a new method for document retrieval based on extended fuzzy concept networks. The conclusions are discussed in Section VI.

(2)

Fig. 1. A concept network.

II. CONCEPTNETWORKS

In [17], Lucarella et al. have proposed concept networks for fuzzy information retrieval. A concept network includes nodes and directed links, where each node represents a concept or a document; each directed link connects two concepts or directs from one conceptC_i to one documentdj and is labeled with a real value between zero and one. IfCi! C j; then it indicates that the degree of relevance

from conceptCi to conceptCj is; where 2 [0; 1]: If Ci! d j;

then it indicates that the degree of relevance of document dj with respect to conceptCi is; where 2 [0; 1]: For Example, Fig. 1 shows a concept network, whereC₁; C₂; 1 1 1 ; and C₇are concepts;

d1; d2; d3; and d4are documents.

From Fig. 1, we can see that documentd₂can be expressed as a fuzzy subset of concepts, where

d2= f(C1; 0:6); (C2; 1); (C5; 0:8)g:

A concept network is assumed to consist of n nodes and some directed links between concepts. LetC be a set of concepts, C =

fC1; C2; 1 1 1 ; Cng; and let the value associated with the directed link

form conceptCito conceptCjbe denoted byF (Ci; Cj); where F is

a mapping function,F : C 2 C ! [0; 1]; and F (Ci; Cj) 2 [0; 1]: If

the relevance value from conceptC_ito conceptC_jisF (C_i; C_j); and if the relevance value from conceptCjto conceptCkisF (Cj; Ck);

then the relevance value from concept Ci to concept C_k can be evaluated by the following expression:

F (Ci; Ck) = min (F (Ci; Cj); F (Cj; Ck)): (1) Similarly, if F (C1; C2); F (C2; C3); 1 1 1 ; and F (Cn; Cn01) are

known, then we can get

F (C1; Cn) = min (F (C1; C2); F (C2; C3); 1 1 1 ; F (Cn01; Cn)):

(2) In a concept network, each document has a different relevance value with respect to each concept. The document descriptor [8] for the documentd_j is defined as a fuzzy subset of the collection of concepts by the following expression:

dj= f(Ci; fd (Ci))j(Ci2 Cg

wherefd (Ci); fd : C ! [0; 1]; represents the degree of relevance

of documentdj with respect to conceptCi: Each user’s query can

be represented by a query descriptorQ expressed as a fuzzy subset of the collection of concepts by the following expression:

Q = f(Ci; fQ(Ci))jCi2 Cg

wherefQ(Ci); fQ: C ! [0; 1]; represents the relevance value of the

query descriptorQ with respect to the concept C_i:

Fig. 2. A concept network of Example 2.1.

Example 2.1: Assume that the concept network shown in Fig. 2 consists of four documents d1; d2; d3; d4; and seven concepts

C1; C2; 1 1 1 ; C7:

If the query descriptorQ is

Q = f(C3; 1:0)g

where 1.0 represents the relevance value of the query descriptor Q with respect to the conceptC3; then the relevance value of document

d2 with respect to concept C3 can be evaluated. From Fig. 2, we can see that there are three different routes which can be applied for determining the relevance value of documentd2with respect to the concept C3:

1) The first route isC3! C1! d2:

Based on [17], the relevance value of document d2 with respect to conceptC₃can be determined as follows:

min (0:7; 0:6) = 0:6:

2) The second route isC3! C4! C2! d2:

Based on [17], the relevance value of document d₂ with respect to conceptC3can be determined as follows:

min (0:9; 0:9; 1) = 0:9:

3) The third route is:C3! C4! C5! d2:

Based on [17], the relevance value of document d2 with respect to conceptC3can be determined as follows:

min (0:9; 0:5; 0:8) = 0:5:

Then, based on [17], we can see that the relevance value of the documentd2with respect to the conceptC3 is

max (0:6; 0:9; 0:5) = 0:9:

The reasoning procedure should be repeated n times if there are n documents. However, there is only one kind of fuzzy relationship between concepts in the concept networks presented in [8] and [17], i.e., the fuzzy positive association relation. If we can provide more kinds of relationships between concepts in a concept network, then there is room for more flexibility. In Section III, we will generalize the concepts of concept networks to propose the concepts of extended fuzzy concept networks which allows four kinds of fuzzy relationships between concepts, i.e., fuzzy positive association, fuzzy negative association, fuzzy generalization, and fuzzy specialization. More powerful knowledge representation capability is consequently provided for.

(3)

complementary, fuzzy incompatible or fuzzy antonyms. 3) Fuzzy generalization is a concept that is regarded as a fuzzy

generalization of another concept if it includes that concept in an analytic or partitive sense.

4) Fuzzy specialization is the inverse of the fuzzy generalization relationship.

The fuzzy relationships between concepts introduced above are described formally as follows.

Definition 3.1: LetC be a set of concepts, then

1) fuzzy positive associationP is a fuzzy relation, P : C 2 C !

[0; 1]; which is reflexive, symmetric, and max-3_-transitive;

2) fuzzy negative association N is a fuzzy relation, N: C 2

C ! [0; 1]; which is anti-reflexive, symmetric, and max-3

-nontransitive;

3) fuzzy generalizationG is a fuzzy relation, G: C 2 C ! [0; 1]; which is anti-reflexive, antisymmetric, and max-3-transitive; 4) fuzzy specializationS is a fuzzy relation, S: C 2 C ! [0; 1];

which is anti-reflexive, antisymmetric, and max-3-transitive. Furthermore, the following restrictions hold [14].

1) P (ci; cj) 6= 0 ! N(ci; cj) = 0 and G(ci; cj) = 0 and S(ci; cj) = 0 and P (cj; ci) = P (ci; cj); 2) N(ci; cj) 6= 0 ! P (ci; cj) = 0 and G(ci; cj) = 0 and S(ci; cj) = 0 and N(cj; ci) = N(ci; cj); 3) G(c_i; c_j) 6= 0 ! P (c_i; c_j) = 0 and N(c_i; c_j) = 0 and S(ci; cj) = 0 and S(cj; ci) = G(ci; cj); 4) S(ci; cj) 6= 0 ! P (ci; cj) = 0 and N(ci; cj) = 0 and G(ci; cj) = 0 and G(cj; ci) = S(ci; cj); for everyci; cj 2 C:

In the following, we present the definition of extended fuzzy concept networks.

Definition 3.2: An extended fuzzy concept network consists of nodes and directed links. Each node represents a concept or a document. Each directed link connects two concepts or connects from a conceptci to a documentdj: If

1) ci(;P )0! cj; then there is a positive association relationship

between conceptci and conceptcj; and the relevance degree

is ; where 2 [0; 1]:

2) ci(;N)0! cj; then there is a negative association relationship

between conceptci and conceptcj; and the relevance degree

is ; where 2 [0; 1]:

3) ci(;G)0! cj; then concept ci is more general than concept cj;

and the degree of generalization is; where 2 [0; 1]: 4) ci(;S)0! cj; then concept ciis more special than conceptcj; and

the degree of specialization is; where 2 [0; 1]:

5) ci(;P )0! dj; then there is a positive association relationship

between conceptciand documentdj; and the relevance degree

is; where 2 [0; 1] (i.e., document dj possesses conceptci

with the degree 2 100%).

Fig. 3. An extended fuzzy concept network.

6) ci(;N)0! dj; then there is a negative association relationship

between conceptciand documentdj; and the relevance degree

is; where 2 [0; 1] (i.e., document djpossesses the concept which is 2 100% complementary with the concept ci). 7) ci(;G)0! dj; then there is a generalization relationship between

concept ci and document dj; and the relevance degree is

; where 2 [0; 1] and concept ci is more general than the concept possessed by document dj with the degree of

2 100%:

8) ci(;S)0! dj; then there is a specialization relationship between

concept c_i and document d_j; and the relevance degree is

; where 2 [0; 1] and concept ci is more special than the concept possessed by document dj with the degree of

2 100%:

Every directed link in an extended fuzzy concept network is labeled with a pair of values (; F R); where denotes the degree of relevance, 2 [0; 1]; and F R denotes the fuzzy relationship between conceptc_iand conceptc_j or between conceptc_i and documentd_j; where F R 2 fP; N; G; Sg:

Example 3.1: Assume that an extended fuzzy concept network as shown in Fig. 3, where c1; c2; 1 1 1 ; c7 are concepts, andd1; d2; d3;

andd4are documents, then we can see that documentd2possesses 50% of conceptc1; 80% of concept c5; and document d2possesses the concept which is 100% complementary with the conceptc2:

In an extended fuzzy concept network, if the relevance degree between conceptci and conceptcj isij; where ij 2 [0; 1]; and

if the relevance degree between concept c_j and conceptc_k is _jk; wherejk 2 [0; 1]; then the relevance degree ik between concept

ci and conceptck can be calculated as follows:

ik= min (ij; jk) (3) where _ik 2 [0; 1]: Furthermore, if the relevance degree between concept c1 and concept c2 is 12; the relevance degree between

concept c2 and concept c3 is 23; 1 1 1 ; and the relevance degree

between concept cn01 and concept cn is _(n01)n; where 12 2

[0; 1]; 23 2 [0; 1]; 1 1 1 ; and (n01)n 2 [0; 1]; then the relevance

degree between conceptc1and conceptcnis1n; where 1n2 [0; 1]

and

1n = min (12; 23; 1 1 1 ; (n01)n): (4)

In an extended fuzzy concept network, if the fuzzy relationship between concept ci and concept cj is F Rij; and if the fuzzy

relationship between concept cj and conceptck isF Rjk; then the

fuzzy relation F Rik between concept ci and concept ck can be obtained by Table I, where P, N, G, and S stand for fuzzy positive association, fuzzy negative association, fuzzy generalization, and

(4)

TABLE I

THECOMBINATION OFFUZZYRELATIONSHIPS

fuzzy specialization, respectively. In Table I, the first row shows the four possible fuzzy relationships ofF Rij; and the first column shows

the four possible fuzzy relationships ofF R_jk: The other elements in the table are the combination ofF RijandF Rjk: From Table I, we

can see that the combination of two relationships of the same type results in a relationship of this type except for negative associations (N) which get positive associations (P). In Table I, we let these four kinds of fuzzy relationships have different priorities, i.e., the negative association (N) has the highest priority, generalization (G) and specialization (S) have lower priority, and the priority of the positive association (P) is the lowest. In Table I, the combination of the high priority relationship and the low priority relationship results in a relationship of high priority except the combination of generalization (G) and specialization (S) which results in positive association (P).

In order to describe the different relevance degrees and fuzzy relationships between documents and concepts, we can represent the documents by extended fuzzy sets which are fuzzy subsets of the set of concepts, where extended fuzzy sets are the generalization of fuzzy sets [27]. For example, let C be a set of concepts. Then, a documentdj can be represented as follows:

dj= f(ci; hvd (ci); rd (ci)i)jci2 Cg

wherevd (ci) represents the relevance degree between document dj

and conceptci; vd : C ! [0; 1]; and rd (ci) stands for the fuzzy

relationship between the documentdjand the conceptci; rd : C !

fP; N; G; Sg:

A user’s queriesQ also can be represented by an extended fuzzy set shown as follows:

Q = f(ci; hvQ(ci); rQ(ci)i)jci2 Cg

where vQ(ci) represents the relevance degree between the query

Q and concept ci; vQ: C ! [0; 1]; and rQ(ci) stands for the

fuzzy relationship between the queryQ and concept ci; rQ: C !

fP; N; G; Sg:

IV. RELATIONMATRICES ANDRELEVANCEMATRICES In this section, we present the definitions of relation matrices and relevance matrices which can be used to model the extended fuzzy concept networks. The definitions of the transitive closure of relation matrices and the transitive closure of relevance matrices are also presented.

TABLE II

THECOMBINATION OFFUZZYRELATIONSHIPS INRELATIONMATRICES

Definition 4.1: A relevance matrix V is a fuzzy matrix [13], where the elementV (ci; cj) represents the relevance degree between

conceptsciandcj; and V (ci; cj) 2 [0; 1]: If V (ci; cj) = 0; then the

relevance degree between conceptci and conceptcj is not defined explicitly by the experts.

Definition 4.2: Assume thatV is a relevance matrix

V = V11 V12 1 1 1 V1n V21 V22 1 1 1 V2n .. . ... ... ... Vn1 Vn2 1 1 1 Vnn

where n is the number of concepts, vij 2 [0; 1]; 1 i n; and

1 j n: See (5), at the bottom of the page, where “_” is

the maximum operator and “^” is the minimum operator. Then, there exists a positive integer p; where p n 0 1; such that

Vp _{= V}p+1 _{= V}p+2 _{= 1 1 1 : Let T = V}p_{; then T is called the}

transitive closure [13] of the relevance matrixV:

Definition 4.3: The relation matrixR is a fuzzy matrix, where the elementR(ci; cj) represents the fuzzy relationship between concept

ciand conceptcj; where R(ci; cj) 2 fP; N; G; S; Zg and P, N, G, S

stand for fuzzy positive association, fuzzy negative association, fuzzy generalization, and fuzzy specialization, respectively. IfR(ci; cj) =

Z; then the fuzzy relationship between concept ci and conceptcj is not defined explicitly by the experts.

LetR be a relation matrix

R = r11 r12 1 1 1 r1n r21 r22 1 1 1 r2n .. . ... ... ... rn1 rn2 1 1 1 rnn

wheren is the number of concepts, rij2 fP; N; G; S; Zg; 1 i n;

and1 j n: See (6), at the bottom of the next page, where “ _ ” is the operation of choosing the highest priority fuzzy relationship and “ ^ ” is the operation of choosing the combination of two relationships according to Table II, where Table II is similar to Table I except that we add character “Z” to represent the relationship between concepts which is not explicitly defined by the experts. From Table II, we can see that the combination of two relationships of the same type results in a relationship of this type except for negative

V2_{= V V =} i=1;111;n (v1i^ vi1) i=1;111;n (v1i^ vi2) 1 1 1 i=1;111;n (v1i^ vin) i=1;111;n (v2i^ vi1) i=1;111;n (v2i^ vi2) 1 1 1 i=1;111;n (v2i^ vin) .. . ... ... ... i=1;111;n (vni^ vi1) i=1;111;n (vni^ vi2) 1 1 1 i=1;111;n (vni^ vin) (5)

(5)

Fig. 4. An extended fuzzy concept network of Example 4.1.

associations (N) which get positive associations (P). Furthermore, in Table II, we let these five kinds of fuzzy relationships have different priorities, i.e., the negative association (N) has the highest priority, generalization (G) and specialization (S) have the second highest priority, the priority of the positive association (P) is lower, and the relationships (Z) not explicitly defined by the experts have the lowest priority. In Table II, the combination of the high priority relationship and the low priority relationship results in a relationship of high priority except the combination of generalization (G) and specialization (S) which results in positive association (P). Then, there exists a positive integer p; where p n 0 1; such that

Rp _{= R}p+1 _{= R}p+2 _{= 1 1 1 : Let L = R}p_{; then L is called the}

transitive closure of relation matrixR:

Example 4.1: Assume that there is an extended fuzzy concept network as shown in Fig. 4, then, we can model this extended fuzzy concept network by the relevance matrixV and relation matrix R shown as follows: V = 1 0:7 0:5 0 0:8 0:7 1 0 0 0 0:5 0 1 0:6 0 0 0 0:6 1 0 0:8 0 0 0 1 R = P S S Z N G P Z Z Z G Z P S Z Z Z G P Z N Z Z Z P :

Then, based on the previous discussion, we can obtain the transitive closureT of the relevance matrix V and the transitive closure L of

V. FUZZYQUERYPROCESSING FORDOCUMENTRETRIEVAL BASED ON EXTENDEDFUZZYCONCEPTNETWORKS In Section III, we have introduced that a document can be rep-resented by an extended fuzzy set, where each concept represents a topic or an attribute. In this section, we use document descriptor rele-vance matrices and document descriptor relation matrices to represent documents, where the document descriptor relevance matrix is used to represent the relevance degrees between concepts and documents, and the document descriptor relation matrix is used to represent the fuzzy relationships between concepts and documents. The definitions of document descriptor relevance matrices and document descriptor relation matrices are presented as follows.

Definition 5.1: Let P be a set of documents, P = fd1; d2;

1 1 1 ; dmg; and let C be a set of concepts, C = fc1; c2; 1 1 1 ; cng:

The document descriptor relevance matrixD is shown as follows:

c1 c2 1 1 1 cn D = d1 d2 1 1 1 dm v11 v12 1 1 1 v1n v21 v22 1 1 1 v2n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 vm1 vm2 1 1 1 vmn

wherem is the number of documents, n is the number of concepts,

vijstands for the relevance degree between documentdiand concept

cj; vij 2 [0; 1]; 1 i m; and 1 j n:

Definition 5.2: Let P be a set of documents, P = fd1; d2;

1 1 1 ; dmg; and C be a set of concepts, C = fc1; c2; 1 1 1 ; cng: The

document descriptor relation matrixM is shown as follows:

c1 c2 1 1 1 cn M = d1 d2 1 1 1 dm r11 r12 1 1 1 r1n r21 r22 1 1 1 r2n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 rm1 rm2 1 1 1 rmn R2_{= R R =}

i=1;111;n(r1i ^ ri1) i=1;111;n(r1i ^ ri2) 1 1 1 i=1;111;n(r1i ^ rin)

i=1;111;n(r2i ^ ri1) i=1;111;n(r2i ^ ri2) 1 1 1 i=1;111;n(r2i ^ rin)

..

. ... ... ...

i=1;111;n(rni ^ ri1) i=1;111;n(rni ^ ri2) 1 1 1 i=1;111;n(rni ^ rin)

(6)

wherem is the number of documents, n is the number of concepts, rij

stands for the fuzzy relationship between documentdi and concept

cj; rij2 fP; N; G; S; Zg; 1 i m; and 1 j n:

In a document descriptor relevance matrix D and a document descriptor relation matrix M; the relevance degrees and fuzzy re-lationship between concepts and documents are given subjectively by experts. However, the experts may somehow forget to set some relevance degrees and fuzzy relationship between concepts and documents. In this case, we can obtain the implicit relevance degrees and fuzzy relationships between concepts and documents by means of the transitive closureT of the relation matrix V and the transitive closureL of the relevance matrix R; respectively. Let D3= D T; then D3 is the document descriptor relevance matrix containing implied relevance degrees between concepts and documents. Let

M3 _{= M L; then M}3 _{is the document descriptor relation}

matrix containing implied fuzzy relationships between concepts and documents. The matricesD3andM3are used as a basis for similarity measures between queries and documents described later.

The user’s query Q can be represented by a query descriptor relevance vector qv and a query descriptor relation vector qr: In this case, if the user’s query is shown as follows:

Q = f(c1; hx1; y1i); (c2; hx2; y2i); 1 1 1 ; (cn; hxn; yni)g

then

qv = hx1; x2; 1 1 1 ; xni

qr = hy1; y2; 1 1 1 ; yni

where xi 2 [0; 1] indicates the desired relevance degree of the

document with respect to conceptci; and yi2 fP; N; G; Sg indicates

the desired fuzzy relationship of the document with respect to concept

ci; and 1 i n: In a query descriptor relevance vector qv; if

xi = 0; then it indicates that documents desired by the user don’t

possess conceptci: If xi = “-,” then it indicates that the relevance

degree of the desired documents with respect to conceptci can be neglected. In a query descriptor relation vectorqr; if yi= “-,” then

it indicates that the fuzzy relationships of the desired documents with respect to concept ci can be neglected. If yi = “N,” then

the user wants to perform a negative query, i.e., there is a negative relationship between the desired documents and conceptc_i: If y_i= “G,” then the user wants to perform a generalization query, i.e., there is a generalization relationship between the desired documents and concept ci: If yi = “S,” then the user wants to perform a

specialization query, i.e., there is a specialization relationship between the desired documents and conceptci:

Lethx; si and hy; ti be two pairs of values where x 2 [0; 1]; y 2

[0; 1]; s 2 fP; N; G; Sg; and t 2 fP; N; G; Sg; then the degree of

similarity betweenhx; si and hy; ti can be evaluated by the function

T;

T (hx; si; hy; ti) = 0_{1 0 jx 0 yj if s = t}ifs 6= t (7) where T (hx; si; hy; ti) 2 [0; 1]: The larger the value of

T (hx; si; hy; ti); the more the similarity between hx; si and hy; ti: Assume that the document descriptor relevance vector dvi

(i.e., theith row of the document descriptor relevance matrix D3); the document descriptor relation vector dri (i.e., the ith row of the document descriptor relation matrix M3); the query descriptor relevance vectorqv and the query descriptor relation vector qr are

represented as follows: dvi= hsi1; si2; 1 1 1 ; sini dri= hti1; ti2; 1 1 1 ; tini qv = hx1; x2; 1 1 1 ; xni qr = hy1; y2; 1 1 1 ; yni where sij 2 [0; 1]; xi 2 [0; 1]; tij 2 fP; N; G; S; Zg; yi 2 fP; N; G; S; Zg; 1 j n; 1 i m; n is the number of

concepts, andm is the number of documents. Let qv(j) and qr(j) be thejth element of the query descriptor relevance vector qv and the

jth element of the query descriptor relation vector qr; respectively.

If qv(j) = “-” or qr(j) = “-,” then it indicates that concept cj

is neglected by the user’s query. The degree of satisfaction that documentdisatisfies the user’s queryQ can be evaluated by

RS(di)

= qv(j)6=“-”and qr(j)6=“-”and j=1;111;n

T(hsij; tiji; hxj; yji)

k

(8) whereRS(di) 2 [0; 1]; 1 i m; and k is the number of concepts

not neglected by the user’s query. The larger the value ofRS(di);

the more the degree of satisfaction that the documentdisatisfies the user’s query. In a fuzzy information retrieval system, we also can set up a retrieval threshold value; where 2 [0; 1]: If RS(di) ;

which indicates that document di satisfies the user’s query. The information retrieval system would display every document having a retrieval status value greater than the threshold value ; where

2 [0; 1]; in a sequential order from the document with the highest

retrieval status value to that with the lowest one.

Example 5.1: Consider the extended fuzzy concept network shown in Example 4.1, where the extended fuzzy concept network has been modeled by the relevance matrixV and the relation matrix

R as shown in Example 4.1, we can see that the transitive closure T

of the relevance matrixR and the transitive closure L of the relation matrix R follows: T = 1 0:7 0:5 0:5 0:8 0:7 1 0:5 0:5 0:7 0:5 0:5 1 0:6 0:5 0:5 0:5 0:6 1 0:5 0:8 0:7 0:5 0:5 1 L = P S S S N G P P S N G P P S N G G G P N N N N N P :

Assume that there are five documents in a fuzzy information retrieval system, and the document descriptor relevance matrix D and the document descriptor relation matrixM are shown as follows:

D = 1 1 1 0 0 0:5 1 0 0:7 0 0 0 0 0:6 0 0:8 1 1 1 0 0:4 0:9 0 0 1 M = P S S Z Z G P Z S Z Z Z Z S Z P S S S Z P S Z Z N :

(7)

P S S S N

If the user’s query represented by the query descriptor relevance vectorqv and the query descriptor relation vector qr are as follows:

qv = h0:6; 1; 0:8; -; 0:7i qr = hP; S; G; -; Ni

then based on (7) and (8), the degree of satisfaction that each document satisfies the user’s query can be evaluated shown as follows: RS(d1) = 0:625 RS(d2) = 0:25 RS(d3) = 0:55 RS(d4) = 0:6 RS(d5) = 0:6:

If the retrieval threshold given by the user is = 0:5; then we can see that documentd2 is not suitable to the user’s query due to the fact that the retrieval status value of the documentd₂is less than the retrieval status value (where = 0:5). Furthermore, we also can see that the documents which satisfy the user’s query ared1; d4; d5; d3:

In this case, document d1 is the best choice for the user’s query, due to the fact that it has the largest retrieval status value.

Consider the following OR-connected queryQ

Q = Q1or Q2 where Q1= f(c1; hx11; y11i); (c2; hx12; y12i); 1 1 1 (cn; hx1n; y1ni)g; Q2= f(c1; hx21; y21i); (c2; hx22; y22i); 1 1 1 (cn; hx2n; y2ni)g

then the sub-query Q1 can be represented by a query descriptor relevance vectorqv1and a query descriptor relation vectorqr1; the

sub-query Q2 can be represented by a query descriptor relevance vectorqv2and a query descriptor relation vectorqr2; where

qv1= hx11; x12; 1 1 1 ; x1ni

qr1= hy11; y12; 1 1 1 ; y1ni

qv2= hx21; x22; 1 1 1 ; x2ni

qr2= hy21; y22; 1 1 1 ; y2ni

wherex_tj2 [0; 1]; y_tj 2 fP; N; G; Sg; 1 t 2; and 1 j n: Assume that the document descriptor relevance vector dvi (i.e., theith row of the document relevance matrix D3) and the document descriptor relation vector dri (i.e., the ith row of the document relation matrixM3) are as follows:

dvi= hsi1; si2; 1 1 1 ; sini dri= hti1; ti2; 1 1 1 ; tini = qv (j)6=“-”and q (j)6=“-”and j=1;111;n_k 1 (10) RS2(di) = qv (j)6=“-”and qr (j)6=“-”and j=1;111;n T (hsij; tiji; hx2j; y2ji) k2 (11) wherek1is the number of concepts not neglected by the sub-query

Q1; k2 is the number of concepts not neglected by the sub-query

Q2; RS1(di) 2 [0; 1]; RS2(di) 2 [0; 1]; and 1 i m: RS1(di)

represents the degree of similarity between the sub-query Q₁ and documentdi; RS2(di) represents the degree of similarity between the

sub-queryQ2 and documentdi; the retrieval status value RS3(di)

represents the degree of similarity of the user’s queryQ with respect to document di; and 1 i m: The fuzzy information retrieval

system would display every document having a retrieval status value greater than the threshold value in a sequential order from the document with the highest degree of retrieval status value to that with the lowest one, where 2 [0; 1]:

Example 5.2: Same assumption as in Example 5.1, where the retrieval status value given by the user is 0.5 (i.e., = 0:5); and the document descriptor relevance matrix D3 and document descriptor relation matrixM3 are as follows:

D3= 1 1 1 0:6 0:8 0:7 1 0:6 0:7 0:7 0:5 0:5 0:6 0:6 0:5 0:8 0:7 1 0:6 0:8 0:8 0:9 0:5 0:5 1 M3₌ P S S S N G P P S N P S S S N P S S S N P S S S N :

Assume that the user’s query Q is as follows:

Q = Q1OR Q2

where the sub-queryQ₁can be represented by the query descriptor relevance vector qv1 and the query descriptor relation vector qr1

shown as follows:

qv1= h0:6; 1; 0:8; -; 0:7i

qr1= hP; S; G; -; Ni

and the sub-query Q2 can be represented by the query descriptor relevance vector qv2 and the query descriptor relation vector qr2

shown as follows:

qv2= h0:9; -; -; -; -i

(8)

Then, based on formula (10), we can get RS1(d1) = 0:625 RS1(d2) = 0:25 RS1(d3) = 0:55 RS1(d4) = 0:6 RS1(d5) = 0:6:

Based on formula (11), we can get

RS2(d1) = 0:95

RS2(d2) = 0

RS2(d3) = 0:55

RS2(d4) = 0:85

RS2(d5) = 0:85:

Furthermore, based on (9), we can get

RS3_(d 1) = max (0:625; 0:95) = 0:95 RS3_(d 2) = max (0:25; 0) = 0:25 RS3_(d 3) = max (0:55; 0:55) = 0:55 RS3(d4) = max (0:6; 0:85) = 0:85 RS3_(d 5) = max (0:6; 0:85) = 0:85:

Because the retrieval status value given by the user is 0.5 (i.e.,

= 0:5), we can see that the document d2is not suitable to the user’s query due to the fact that the retrieval status value of the documentd2

is less than the retrieval threshold value (where = 0:5). In this case, the documents which satisfy the user’s query are d1; d3; d4;

and d₅: Furthermore, we also can see that the document d₁ is the best choice for the user’s query due to the fact that it has the largest retrieval status value.

Weighted queries can also be processed by our method. Assume that there aren concepts in a fuzzy information retrieval system, and assume that the weight of the conceptc_j given by the user is w_j; wherewj2 [0; 1]; and 6nj=1wj= 1: Furthermore, assume that the

user’s query is shown as follows:

Q = f(c1; hx1; y1i); (c2; hx2; y2i); 1 1 1 ; (cn; hxn; yni)g

wherexi2 [0; 1]; which indicates the desired relevance degree of the

document with respect to conceptc_i; y_i2 fP; N; G; Sg indicates the desired fuzzy relationship of the document with respect to concept

ci; and 1 i n: Based on the previous discussions, the user’s

queryQ can be represented by a query descriptor relevance vector

qv and a query descriptor relation vector qr; where qv = hx1; x2; 1 1 1 ; xni

qr = hy1; y2; 1 1 1 ; yni:

Assume that the ith row of the document descriptor relevance matrixD3behsi1; si2; 1 1 1 ; sini and assume that the ith row of the

document descriptor relation matrixM3beht_i1; t_i2; 1 1 1 ; t_ini; where

sij 2 [0; 1]; tij 2 fP; N; G; S; Zg; 1 i m; and 1 j n:

Then, the degree of similarity between the user’s queryQ and the documentdi can be calculated as follows:

RS3 w(di) =

qv(j)6=“-”and qr(j)6=“-”and j=1;111;n

1 T (hsij; tiji; hxj; yji) 2 wj (12)

where the retrieval status valueRS_w3(di) indicates the degree of

sim-ilarity between the user’s queryQ and the document d_i; RS_w3(d_i) 2

[0; 1]; and 1 i m: The system would display every document

having a retrieval status value greater than the threshold value in a sequential order from the document with the highest degree of retrieval status value to that with the lowest one, where 2 [0; 1]:

Example 5.3: Same assumption as in Example 5.1, where the retrieval status value given by the user is 0.5 (i.e., = 0:5); and the document descriptor relevance matrixD3and the document descriptor relation matrix M3 are as follows:

D3= 1 1 1 0:6 0:8 0:7 1 0:6 0:7 0:7 0:5 0:5 0:6 0:6 0:5 0:8 0:7 1 0:6 0:8 0:8 0:9 0:5 0:5 1 M3= P S S S N G P P S N P S S S N P S S S N P S S S N :

Assume that the user’s query represented by the query descriptor relevance vector qv and the query descriptor relation vector qr are as follows:

qv = h0:6; 1; 0:8; -; 0:7i qr = hP; S; G; -; Ni

and assume that the weights of the conceptsc1; c2; c3; and c5given by the user are 0.4, 0.4, 0.1, and 0.1, respectively, then based on formula (12), we can get

RS3 w(d1) = 0:6 3 0:4 + 1 3 0:4 + 0 3 0:1 + 0:9 3 0:1 = 0:73; RSw3(d2) = 0 3 0:4 + 0 3 0:4 + 0 3 0:1 + 1 3 0:1 = 0:1; RS3 w(d3) = 0:9 3 0:4 + 0:5 3 0:4 + 0 3 0:1 + 0:8 3 0:1 = 0:64; RS3_w(d 4) = 0:8 3 0:4 + 0:7 3 0:4 + 0 3 0:1 + 0:9 3 0:1 = 0:69; RS3 w(d5) = 0:8 3 0:4 + 0:9 3 0:4 + 0 3 0:1 + 0:7 3 0:1 = 0:75:

Because the retrieval status value given by the user is 0.5 (i.e.,

= 0:5), we can see that the documents which satisfy the user’s

query ared1; d3; d4; and d5; where the document d2is not suitable to the user’s query due to the fact that the retrieval status value of the documentd₂is less than the retrieval status value (where = 0:5). In this case, documentd5is the best choice for the user’s query due to the fact that it has the largest retrieval status value.

VI. CONCLUSIONS

In this paper, we have presented the concepts of extended fuzzy concept networks, where there are four kinds of fuzzy relationships between concepts in an extended fuzzy concept network, i.e., fuzzy positive association, fuzzy negative association, fuzzy generalization, and fuzzy specialization. We also presented a fuzzy information retrieval method based on the extended fuzzy concept networks for document retrieval. The proposed method is more flexible and more intelligent than the ones presented in [8] and [17] due to the fact that it allows the users to perform positive queries, negative queries, gen-eralization queries, and specialization queries. The proposed method allows the users to perform fuzzy queries in a more flexible and more intelligent manner.

(9)

1990.

[6] , “An inexact reasoning algorithm for dealing with inexact knowl-edge,” Int. J. Softw. Eng. Knowl. Eng., vol. 1, no. 3, pp. 227–244, 1991.

[7] S. M. Chen and Y. J. Horng, “Finding inheritance hierarchies in interval-valued fuzzy concept-networks,” Fuzzy Sets Syst., vol. 84, no. 1, pp. 75–83, 1996.

[8] S. M. Chen and J. Y. Wang, “Document retrieval using knowledge-based fuzzy information retrieval techniques,” IEEE Trans. Syst., Man, Cybern., vol. 25, pp. 793–803, May 1995.

[9] G. T. Her and J. S. Ke, “A fuzzy information retrieval system model,” in Proc. 1983 National Computer Symp., Taiwan, R.O.C., 1983, pp. 147–151.

[10] Y. J. Horng and S. M. Chen, “Document retrieval based on extended fuzzy concept networks,” in Proc. 4th Nat. Conf. Defense Management, Taipei, Taiwan, R.O.C., 1996, vol. 2, pp. 1039–1050.

[11] I. Itzkovich and L. W. Hawkes, “Fuzzy extension of inheritance hierar-chies,” Fuzzy Sets Syst., vol. 62, no. 2, pp. 143–153, 1994.

[12] M. Kamel, B. Hadfield, and M. Ismail, “Fuzzy query processing using clustering techniques,” Inf. Process. Manage., vol. 26, no. 2, pp. 279–293, 1990.

[13] A. Kandel, Fuzzy Mathematical Techniques with Applications. Read-ing, MA: Addison-Wesley, 1986.

[14] M. Kracker, “A fuzzy concept network model and its applications,” in Proc. 1st IEEE Int. Conf. Fuzzy Systems, 1992, pp. 761–768. [15] D. H. Kraft and D. A. Buell, “Fuzzy sets and generalized Boolean

retrieval systems,” Int. J. Man-Mach. Stud., vol. 19, no. 1, pp. 45–56, 1983.

[16] C. G. Looney, “Fuzzy Petri nets for rule-based decision making,” IEEE Trans. Syst., Man, Cybern., vol. 18, pp. 178–183, 1988.

[17] D. Lucarella and R. Morara, “FIRST: Fuzzy information retrieval system,” J. Inf. Sci., vol. 17, pp. 81–91, 1991.

[18] T. Murai, M. Miyakoshi, and M. Shimbo, “A fuzzy document retrieval method based on two-valued indexing,” Fuzzy Sets Syst., vol. 30, pp. 103–120, 1989.

[19] S. Miyamoto, “Information retrieval based on fuzzy associations,” Fuzzy Sets Syst., vol. 38, pp. 191–205, 1990.

[20] T. Radechi, “Mathematical model of time effective information retrieval system based on the theory of fuzzy set,” Inf. Process. Manage., vol. 13, pp. 109–116, 1977.

[21] T. Radechi, “Fuzzy set theoretical approach to document retrieval,” Inf. Process. Manage., vol. 15, vol. 247–259, 1979.

[22] , “Generalized Boolean methods of information retrieval,” Int. J. Man-Mach. Stud., vol. 18, no. 5, pp. 409–439, 1983.

[23] R. Rousseau, “On relative indexing in fuzzy retrieval systems,” Inf. Process. Manage., vol. 21, no. 5, pp. 415–417, 1985.

[24] G. Salton and M. J. Mcgill, Introduction to Modern Information Re-trieval. New York: McGraw-Hill, 1983.

[25] V. Tahani, “A fuzzy model of document retrieval system,” Inf. Process. Manage., vol. 12, pp. 177–187, 1976.

[26] J. Y. Wang and S. M. Chen, “A knowledge-based method for fuzzy in-formation retrieval,” in Proc. 1st Asian Fuzzy Systems Symp., Singapore, 1993.

[27] L. A. Zadeh, “Fuzzy sets,” Inf. Contr., vol. 8, pp. 338–353, 1965. [28] M. Zemankova, “FIIS: A fuzzy intelligent information system,” Data

Eng., vol. 12, no. 2, pp. 11–20, 1989.

[29] R. Zwick, E. Carlstein, and D. V. Budescu, “Measures of similarity

Abstract—The paper describes a specification model, called the Process and Data Net (PDN) model, used as the modeling tool for the M3 -OBJECT information system design methodology. The model integrates the representation of static, dynamic, and behavioral aspects of a database application. PDN consists of two components: an object-oriented data model that describes static and behavioral aspects of objects of the system under analysis, and a process model that specifies a way organization activities must be coordinated. The major features of the proposed ap-proach are: 1) the system representation captures all relevant properties from the end-user viewpoint without unnecessary details concerning implementation, 2) complex data structures and data manipulations can be specified, and 3) specifications are executable for rapid prototyping.

I. INTRODUCTION

To design and develop an information system which supports modern (and complex) applications, such as office automation (OA), computer-integrated manufacturing (CIM), or software engineering, a sound conceptual design methodology is required. This problem has been recognized to be of strategic importance for CIM by several international projects such as CIMOSA [1] and the Purdue Enterprise Reference Architecture [2].

A methodology consists of models used to describe a real-world system under consideration (e.g., an enterprise), and methods, i.e., design strategies to elaborate the real-world description. First ap-proaches devoted to conceptual modeling were mostly concerned with the representation of static properties, i.e., the modeling of data structures (to represent components of the object system) and integrity constraints on data (i.e., rules that data must satisfy). Significant models of this type, such as the Entity-Relationship model [3] and the semantic data model [4], have been developed.

The need to integrate the specification of dynamic properties in the system representation was soon recognized. Dynamic properties refer to processes. Processes are made of activities (linked by causal relationships) that must be specified to describe the system organization, and dynamic constraints, i.e., rules which must be satisfied by processes under analysis [5]. Several approaches have been proposed to take into account dynamic properties. Some of them (e.g., REMORA [6], TAXIS [7], or TEMPORA [8]) are based on the concept of an event, which can be seen as a control mechanism Manuscript received March 18, 1995; revised December 15, 1995 and January 6, 1997.

G. Berio was with the Dipartimento di Informatica, Universit´a di Torino, Corso Svizzera 185, I-10149 Torino, Italy. He is now with LGIPM, Universit´e de Metz, F-57012 Metz, France.

A. DiLeva and P. Giolito are with the Dipartimento di Informatica, Universit´a di Torino, I-10149 Torino, Italy.

F. Vernadat is with LGIPM, Universit´e de Metz, F-57012 Metz, France. Publisher Item Identifier S 1083-4419(99)00777-3.