A Fuzzy Information Retrieval Method Using Fuzzy-Valued Concept Networks
Yih-Jen Horng", Shyi-Ming Chen', and Chia-Hoang Lee**
*Department of Electronic Engineering
National Taiwan University of Science and Technology, Taipei, Taiwan, R. 0. C .
**Department of Computer and Information Science National Chiao Tung University, Hsinchu, Taiwan, R. 0. C.
Abstract
In this paper, we present a fuzzy information retrieval method based on fuzzy-valued concept networks, where the relevant degree between any two concepts in a fuzzy-valued concept network is represented by arbitrary shapes of fuzzy numbers. There are two kinds of relevant relationships between any two concepts in a fuzzy-valued concept network (i.e., fuzzy positive association and fuzzy negative association). In order to reduce the time of fuzzy inference, the relevant matrices and the relationship matrices are used to model the fuzzy-valued concept networks. The elements of a relevant matrix represent the relevant degrees between concepts. The elements of a relationship matrix represent the relevant relationships between concepts.
Furthermore, we also allow users' queries to be represented by arbitrary shapes of fuzzy numbers and to use the fuzzy positive association relationship and the fuzzy negative association relationship for formulating their queries in order to increase the flexibility of fuzzy information retrieval systems. We also present a fuzzy information retrieval method based on the network-type fuzzy-valued concept network architecture in the Intemet environment.
Keywords: Fuzzy-Valued Concept Network, Fuzzy Information Retrieval, Fuzzy Number, Information Retrieval System, Network-Type Fuzzy-Valued Concept Network.
1. Introduction
Most of the existing information retrieval systems
are based on the traditional Boolean logic model [16].
The information retrieval systems based on the Boolean logic model all assume that the documents and the users' queries could be represented by index terms exactly. This makes these systems restricted in practical usage especially in the circumstance where the information has uncertainty or fuzziness. In order to improve the drawbacks of the traditional Boolean logic model, some models like the probability model, the fuzzy set model, and the vector space model have been presented [16]. Since the fuzzy set model can properly represent the inexact and uncertain knowledge of human beings, many researches are devoted to use the fuzzy set model in the designing of fuzzy information retrieval systems. Moreover, many information retrieval techniques have been presented, such as [l], [6], [lo], In [ll], Lucarella et al. present am information [111, [ 1 4 , [ W , [14l, [1% and [ W .
retrieval method that uses fuzzy concept networks for
knowledge representation. A fuzzy concept network
consists of nodes and links. Each node in a fuzzy
concept network represents a document or a concept
(i.e., an index item or a topic of documents). Each link
in a fuzzy concept network connects two concepts and is
labeled with a real value between zero and one, which
represents the relevant degree between two concepts. By
means of fuzzy inference through fuzzy concept
networks, the information retrieval systems are
developed. Since the fuzzy inference through the fuzzy
concept network is time consuming, the information
retrieval method proposed by Lucarella et al. is not
efficient enough. In [3], Chen et al. used concept
matrices to model fuzzy concept networks and perform
fuzzy inference thmugh concept matrices instead of fuzzy concept networks. Since the fuzzy inference through concept matrices can be done more quickly, the fuzzy information retrieval systems can be more efficient.
However, the Fuzzy concept networks used by Lucarella et al. [ll] and Chen et al. [3] are restricted because the relevanl degree between concepts must be real values between zero and one and the concepts must be linked with a fuzzy positive association relationship.
Thus, the information retrieval systems based on this kind of fuzzy concept networks are also restricted. If we can allow the relevant degree between concepts in a fuzzy concept network to be represented by arbitrary shapes of fuzzy numbers and allow the concepts in a fuzzy concept network to be linked with a fuzzy positive association relatiomhip or fuzzy negative association relationship, then there is room for more flexibility.
In this paper, we present a fuzzy information retrieval method based on fuzzy-valued concept networks. A fuzzy-valued concept network consists of nodes and links, each node represents a document or a concept, and each link between two nodes associated with a tuple represents the relevance between two concepts, where the tuple represents the relevant degree and the relevant rdationship between two concepts, respectively. The values of the relevant degree between two nodes not only can be real numbers between zero and one, but also can be arbitrary shapes of fuzzy numbers. Moreover. the relevant relationship between two concepts not only can be fuzzy positive association, but also can be fuz;g negative association. In order to reduce the time of fuzzy inference, we will use the relevant matrices and the relationship matrices to model the fuzzy-valued cclncept network. The elements of a relevant matrix repi esent the relevant degrees between concepts. The elements of a relation matrix represent the relevant relationships between concepts. Furthermore, we also allow users’ queries to be represented by arbitrary shapes of fuzzy numbers and to use the fuzzy positive association relationship and the fuzzy negative association relationship for formulating their queries for increasing the flexibility of fuzzy information retrieval systems.
Furthermore, because of the trend of the Internet, the documents required by users should not be bound on a single host computer. A smart information retrieval system might help the users to get the documents required by the users on different computers through the Internet when the required documents can’t be found on the computers where the users submit their query expressions. Thus, we also expand our fuzzy-valued concept network architecture to the network-type fuzzy- valued concept network and present an information retrieval method in the Internet environment based on the network-type fu:ezy-valued concept network.
2. Fuzzy-Valued Concept Networks
Firstly, we briefly review the concepts of fuzzy numbers [5] and the concepts of fuzzy positive association relationship [9] and fuzzy negative association relationship [9].
A fuzzy number F is a fuzzy set defined on the universe of discourse U that is both convex and normal. A fuzzy set A is convex if and only if for all U,, u2 in U,
. . -
f A ( 4
+(1-4u*) 2MWf,(u,),f,(u,)) ( l ) where& is the membership function of the fuzzy set A,
fA: U + [0, 11, and A E [0, 11. A fuzzy setA is normal if there exists U , E U , such that f~ ( u j ) = 1, where
fA is the membership function of fuzzy set A,fA: U -+
10, 11.
In this paper, for the convenience of explanations, we assume that the fuzzy numbers used in the fuzzy- valued concept network are all represented in the “close to” shape. However, the fuzzy numbers of arbitrary shapes are allowed in the fuzzy-valued concept network presented in this paper. According to [ 5 ] , a “close to f’
fuzzy number is shown in Fig. 1.
c
&lose,o ,(U)
t
O Y - P Y U - P
Fig. 1. A “close to y” fuzzy number.
The membership function of the “close to y” fuzzy number is
where the crossover points are U = y f / I , and the attribute p is the “half-width” of the curve at the crossover point. The larger the value of p, the wider the curve is. In this paper, we assume that the value of p is 0.1.
According to [SI, a fuzzy number A may be decomposed into their level sets (i.e., a-cuts), i.e.,
N
N
A = [:aA“,
N
where& = [a,@’, a?’] is the a-cut of A , and a
E[0, 11. Assume that there is another fuzzy number B :
N
=
N
N - Y N
- - -
,,p,n(%QV,l) ,=p,n(vl,8v1Z)
... 0
(Vl,@V,J- -
r=l, .n
- - - -
,=P~(VZ,~V,I)
, = p , n ( ~ ~ t Q ~ c ~ )...
,=y,n(v2i@vm)(7)
0
(VnlQVil)0
(vnzavi2)... 0
(vnt@vzn)- - - - - -
r=l, ,n r=l, ,n r=l, ,n
(4)
R =
-
hwhere B, = [ b y ) , b p ’ ] is the a-cut of B , and a E [O, 11. Then, according to [SI, the ”OR” operation and the ”AND” operation of the fuzzy numbers A and B are defined by:
N N
A” 6?) 5 =j:a[a,@) vb,’”), a?) v b ? ) ] ( 5 )
-
-
‘11 ‘12
...
‘In‘i1 ‘22
...
‘2,. . . .
. . ... .
-‘nl ‘“2
...
‘nn-where ”a” and ”a” are the ”OR” operator and the ”AND” operator of fuzzy numbers, respectively;
”A
”is the minimum operator,
”v
”is the maximum operator, and a E [0, 11.
In the following, we briefly review the concepts of the fuzzy positive association relationship and the fuzzy negative association relationship from [9]:
(1) Fuzzy positive association:
It relates concepts that have in some contexts a similar meaning (e.g., person
t)individual) or which are typically used in the same context (e.g., person
t)address).
It relates concepts which are complementary (e.g., male
t)female), incompatible (e.g., unemployed
t)
freelance) or antonyms (e.g., small
t)large).
A fuzzy-valued concept network can be represented as FVCN (S, K ) , where S is a set of nodes, and each node stands for a concept or a document; K is the set of all links between nodes. If k E K, then k can be represented by the tuple (j!?,FR ), where is the degree of linking strength and its value is a fuzzy number; FR is the relationship between two concepts linked by k, and FRE {P, w, where P stands for the fuzzy positive association relationship, N stands for the fuzzy negative association relationship.
( 2 ) Fuzzy negative association:
3. Relevant Matrices and Relationship Matrices
In the following, we introduce the definitions of relevant matrices and relation matrices to model fuzzy- valued concept networks. The definitions of the transitive closure of relevant matrices and the transitive closure of relationship matrices are also presented.
Definltlon3.1: The relevant matrix V is a fuzzy matrix, where the element v,, represents the relevant degree between concept c, and concept c, in a fuzzy-
. .
N
N
valued concept network, where is a fuzzy number. If v.. = 0 , then it means that the relevant degree between concept c, and concept e, is not given by the experts in the fuzzy-valued concept network.
N N
. .
Definltlon: Assume that Vis a relevant matrix,
r - - - 1
i=l.
0 ...
n( r i m ) i;F,n(r,e9r,2) ...
i=l,..,n(rieD.,">
where "Ip1" is t h e operator of choosing the fuzzy relationships whos,e priority is the highest. In this paper, we give the first priority to the fuzzy negative association relationship (N), the fuzzy positive association relationship (P) gets the second priority, and the relationship (Z) gets the lowest priority (i.e., N
> P > Z). "n'' is the operator of choosing the combination of twcl relationships according to Table 1.
Then, there exists a positive integer p , p S n - 1, such
that R P =
~ p + l= RJ'+~ = . . . . Let L = R P , then L is called the transitive closure of the relationship matrix R.
Table 1. The combination of fuzzy relationships.
I
4. Fuzzy Query Processing for Document Retrieval Based on Fuzzy-Valued Concept Networks
Fuzzy matrices also can be used to represent the links between documents and concepts. In the following, we will, introduce the definitions of the document descriptor relevant matrix and the document descriptor relationship matrix.
-4 .1
: Let D be the set of documents in the fuzzy-valued concept network, D = { d ] , d,, . . ., dm}, and let C be the set of concepts in the fuzzy-valued concept network, C = {c,, c,, ..., c,}. Then, the document descriptor relevant matrix E is shown as follows:
. .
C I
c2 ...
C"N N
N
where m is the nuniber of documents, n is the number of concepts, e . . stands for the relevant degree between document di and concept c, , e,j. is a fuzzy number, 1 I i I m i , and 1 I -1 I n .
N
N
. .
Definltlon: The document descriptor relationship matrix F is shown as follows:
CI c1
1 . - Cndl
f l lA, ... A"
F = d , 1 f :T'
f l l;
;*f:]
dm
f m , f m lfm
where f, stands for the fuzzy relationship between document dl and concept c,, and & f ; { P , N, Z}.
However, the experts may forget to set the relevant degrees and relationships between some documents and some concepts. Since the implicit relevant degrees and relationships between concepts can be obtained from the transitive closure T of the relevant matrix V and the transitive closure L of the relationship matrix R, we can use the transitive closure T of the relevant matrix Vand the transitive closure L of the relationship matrix R to get the implicit relevant degrees and relationships between documents and concepts. Let E'
= E 0 T, then E' includes the implicit relevant degree between documents and concepts. Let F" = F d L, then
F" includes the implicit relevant relationships between documents and concepts. E" and F" will then be used as a basis for similarity measures between queries and documents. Each row of E" can be thought as a document descriptor relevant vector and each row of
F" can be thought as a document descriptor
relationship vector.
The user's query Q can be represented by a query descriptor relevant vector q v and a query descriptor relationship vector qr shown as follows:
- -
" N
-
q v = < X I , x,, ...) X">, (V =
91 9Y23...,Yn>,
-
where 2, means the relevant degree between desired documents and concept c,, .", is a fuzzy number, and 1 I i I n; y, means the relationship between desired documents and concept c,, and y,
E{ P , N } . If yl = P, then the desired documents should contain concept c,;
ify, = N, then the desired documents should contain the complement of the concept c,. Moreover, if the user doesn't set the values of and y, , then concept c, is thought as been neglected by the user and andy, will be labeled as "-". That is, the users "don't care"
whether the retrieved documents contain concept c, or not.
Assume that there are two tuples, i.e., < A , B, and
<e, D>, where 2 and ? are fuzzy numbers, and B
E
{ P , N, Z } , and D E { P , N, Z } , then the degree of similarity between <z, B> and <?, D> can be
-
calculated by
Host 1
where a E [O, 11 and Y ( < z , B>, < F , D )
E[0, 11.
Assume that the document descriptor relevant vector d., and the document descriptor relationship vector du, are represented as follows:
-
N&, = < T I , T2, ..., V l n > ,
...
Host
n...
Host
2Host
3dr, = +,,, r,2, . . ,, r,,,>.
Then, the degree of satisfaction that document d, satisfies the user's query Q can be evaluated by
where - ~ ( j ) is thejth element of the query descriptor relevant vector - - qv , q r ( j ) is thejth element of query descriptor relationship vector qr , 1 I j I n , RS(dJ E [0, 13, and k is the number of concepts not neglected by the user's query. The information retrieval system
would display every document which has the degree of satisfaction greater than a threshold value h, where h E
[0, 11, in a sequential order from the document with the highest degree of satisfaction to that with the lowest one.
5. Fuzzy Query Processing Using Fuzzy- Valued Concept Networks in the Internet Environment
Since the prevalence of the Intemet [4] and [18], the information about the documents needed by the user should not be bound on 2 single host computer. When the users' queries can't be satisfied on the local computer, the information retrieval system should expand its searching capability to other computers on the Intemet until the required documents are either found or are declared non-existent.
In this section, we will present the network type fuzzy-valued concept networks architecture as the basis for fuzzy information retrieval in the Intemet environment. The architecture of the network type fuzzy-valued concept network is shown in Fig. 2.
Valued Concept Valued Concept Valued Concept Valued Concept
Intemet
Fig. 2. The architecture of the network type fuzzy-valued concept network.
From Fig. 2, we can see that each host is linked to the Internet by the bold black lines. Each host has a local fuzzy-valued concept network as the knowledge base of the documents and concepts. Substantially, the local fuzzy-valued concept networks inside these hosts are the same as the ones introduced in the previous sections.
That is, the fuzzy-valued concept networks inside these hosts allow the values of the relevant degrees between concepts to be arbitrary shapes of fuzzy numbers, and the relevant relationships between nodes to be the fuzzy positive association relationship or the fuzzy negative association relationship.
Since the local fuzzy-valued concept networks inside these hosts are the same as the ones introduced in the previous sections, we can also model these local fuzzy-valued concept networks by relevant matrices and relationship matrices. Furthermore, we can get the
transitive closures of the relevant matrices and the transitive closures of the relation matrices when the relevant matrices and relationship matrices are known.
The implicit relevant degrees and relationships between concepts then can be found in the transitive closures of the relevant matrices and the transitive closures of the relationship matrices, respectively.
The document descriptor relevant matrices and document descriptor relationship matrices can model the relevant degrees and fuzzy relationships between documents and concepts in each local fuzzy-valued concept network inside each host on the Internet.
However, the experts may forget to set the relevant
degrees or fuzzy relationships between some documents
and concepts. Because all associate concepts are linked
together, we can get the implicit relevant degrees and
fuzzy relationships between documents and concepts by
the transitive closui*es of the relevant matrices and the transitive closures of the relationship matrices. Assume the document desciiptor relevant matrix is E, and the transitive closure of relevant matrix is T, let E' = E 0 T, then E' includes all the implicit relevant degrees between documents and concepts. Assume the document descriptor relation matrix is F, and the transitive closure of relationship matrix is L, let F" = F #+ L , then F"
includes all the implicit relationships between documents and concepts.
By the previoiis discussions, we know that the fuzzy-valued concept network contains nodes and links.
These nodes stand for either documents or concepts. In the network type fuzzy-valued concept network architecture, we assume that each local fuzzy-valued concept network may have identical concept nodes and different document nodes. Therefore, the relevant matrices and relationship matrices used to model the local fuzzy-valued Eoncept networks on each host are identical. But the document descriptor relevant matrices and the document descriptor relationship matrices are different on different hosts.
U Assume that Fig. 3 and Fig. : 4 are two local fuzzy-valued conccpt nctworks on Host 1 and Host 2, respectively, which are linked by the Intemet. From Fig 3 and Fig. 4, we can see that concepts c,, c,, c3. c4, c5 and documents d,, ai, d3 are located on Host 1, and that concepts c,, cs c5,
c6,c7 and documents d4, d,, d6 are located on Host 2.
R = c , N
Fig. 3. The f?uzzy-\falued concept network on Host 1.
c,-P Z N P Z- c , z P z z P c , P z z P z
c , z P z z P Z P Z Z .
( 0 . 7 J " X (0.6,P) (0.9,P)
Fig. 4. The fuzzy-valued concept network on Host 2.
" " N
1 0 0.7 0.6 0 0 1 0 0 0.9
W " C C
" N U N
0.7 0 1 0.6 0 0.6 0 0.6 1 0 0 0.9 0 0 1
" N U N
" N U -
By the previous discussions, we can see that the relevant matrices and relationship matrices on these two hosts are
'
Vand R, respectively, where
" " N
dl 0 0 0.8 1 0
Y " "
El = d 2 2 2 OL9 6 0.
d3 0 0 0 0 0.8
-
C1
c2 v = c 3
=4
c5, CI
c2
L = c3 c4
c , c, c3 c4 c,
P Z N P Z
Z P Z Z P
N Z P N Z
P Z N P Z
c s l z P z z P J
c1 c2 c3 c4 cs d l Z Z N P Z F1=d2 Z Z i Z P P , 1
N N N N N
0 0 0.9 0.8 0 0 0 0 0.6 0.9
0 0 0 0 1
N N N "
N N N N N
L
d j L Z Z Z Z P]
, d4
E2 =d5 d6
0.7 0 0.9 0.8 0 0.6 0.9 0.6 0.6 0.9
0 0.9 0 0 1
N""
N""
L -1
0.8+1+0 - 1.8 = o,6,
RS(d,) = --
3 3
,
RS(d3) = !??!?- - - = 0.333.
3 3
c1 c2 c3 c4 c5
Let E,' = E , 0 T, then E,' contains the implicit relevant degrees between documents and concepts of the local fuzzy-valued concept networks in Host 1. Let F,* = F , d L, then F,* contains the implicit relationships between documents and concepts of the local fuzzy-valued concept networks in Host 1. Let E,'
= E, 0 T, then E,' contains the implicit relevant degrees between documents and concepts of the local fuzzy-valued concept networks in Host 2. Let F,' = F2 d L, then F,' contains the implicit relationships between documents and concepts of the local fuzzy- valued concept networks in Host 2, where E,*, F,*, E;, and F; are shown as follows:
c1 c2 c3 c4 cs dl
d3 E; =d2
N N N "
0.7 0 0.8 1 0 0.6 0.6 0.6 0.9 0.6
0 0.8 0 0 0.8
" " N
" N N N
F; = d 2 d3
d4 E; =dg d6
P P N P P ,
Z P Z Z P J
4 c2 c3 c4 cs d 4 N Z P N Z F;=dS I P P N P P , i
where E,*, F,*, E;, and F,' form the basis of computing the similarities between documents and users' queries.
Assume a user formulates his (her) query expression in the fuzzy information retrieval system based on the network type fuzzy-valued concept network on Hostl. Firstly, this query expression is handled by the method presented in Section 4. If the desired documents are not found in Hostl, the system can choose other hosts from a list of hosts. Then, the user's query is sent to the other hosts chosen by the user automatically by system. Assume Host 2 is chosen, then the user's query is handled on Host 2 to see if the desired documents are located on Host 2. If the desired documents do not exist on Host 2, then the other hosts are chosen to process the user's query. The above processes are done repetitively until the desired documents are found or they don't exist.
EsmpkX25.2: As in Example 5.1, the fuzzy-valued concept network in Host 1 and the fuzzy-valued concept network in Host 2 are shown in Fig. 3 and in Fig. 4, respectively. Assume that the user sets his (her) query firstly in Host 1 and he (she) hopes the retrieved document should contain concept 2 (the degree of strength is about 0.8), concept 4 (the degree of strength is about 0.9), and contain the complement of concept 5 (the degree of strength is about 1). Then the user's query Q can also be represented by a query descriptor relevant vector qvl and query descriptor relationship vector
qrl shown as follows:
__
-
CI
c2 c3c4 c5
__
1 I -qV1 =<-, 0 8 , -, 0.9, 1 >,
CI