University 國立交通大學]
On: 28 April 2014, At: 06:36
Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered
Number: 1072954 Registered office: Mortimer House, 37-41
Mortimer Street, London W1T 3JH, UK
Cybernetics and
Systems: An
International Journal
Publication details, including
instructions for authors and
subscription information:
http://www.tandfonline.com/loi/
ucbs20
A KNOWLEDGE-BASED
METHOD FOR FUZZY
QUERY PROCESSING
FOR DOCUMENT
RETRIEVAL
SHYI-MING CHEN
Published online: 29 Oct 2010.
To cite this article: SHYI-MING CHEN (1997) A KNOWLEDGE-BASED
METHOD FOR FUZZY QUERY PROCESSING FOR DOCUMENT RETRIEVAL,
Cybernetics and Systems: An International Journal, 28:1, 99-119, DOI:
10.1080/019697297126272
To link to this article:
http://dx.doi.org/10.1080/019697297126272
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of
all the information (the “Content”) contained in the publications
on our platform. However, Taylor & Francis, our agents, and our
to the accuracy, completeness, or suitability for any purpose
of the Content. Any opinions and views expressed in this
publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The
accuracy of the Content should not be relied upon and should
be independently verified with primary sources of information.
Taylor and Francis shall not be liable for any losses, actions,
claims, proceedings, demands, costs, expenses, damages, and
other liabilities whatsoever or howsoever caused arising directly
or indirectly in connection with, in relation to or arising out of
the use of the Content.
This article may be used for research, teaching, and private
study purposes. Any substantial or systematic reproduction,
redistribution, reselling, loan, sub-licensing, systematic supply,
or distribution in any form to anyone is expressly forbidden.
Terms & Conditions of access and use can be found at
http://
www.tandfonline.com/page/terms-and-conditions
A KNOWLED GE-BASED METHOD FOR
FUZZY QUER Y PR OC ESSING FOR
D OC UMENT R ETR IEVAL
SHYI-MING C HEN WEN-HOAR HSIAO YIH-JEN HOR NG
Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan, Republic of China
This pape r pre se nts a knowle dge -base d m ethod for proce ssing fuzzy que rie s and we ighte d-fuzzy que rie s for docume nt re trie val, whe re fuzzy conce pt m atrice s are use d for knowle dge re pre se ntation and the e lem ents in a fuzzy conce pt matrix are re pre se nte d by trape zoidal fuzzy numbe rs param eterized
s .
by quadruple s a, b, c, d , whe re 0( a( b( c( d( 1. Inte llige nt re -trie val capability and flexible use r’ s que rie s are conse que ntly provide d for.
s .
Salton and McGill 1983 pointed out that information re trieval is conce rned with the re prese ntation, storage , organization, and acce ssing of information item s. Most comme rcial information re trie val systems still adopt the B oolean logic model for information retrie val. H owe ve r, the information retrie val systems based on the Boole an logic mode l are
The authors would like to thank the re fere e s for providing very he lpful com me nts and sugge stions. The ir insight and comme nts led to a be tter pre se ntation of the ide as e xpre sse d in this pape r.
This work was supporte d in part by the National Scie nce Council, Republic of China, unde r grant NSC 83-0408-E-009-041.
Addre ss corre sponde nce to Shyi-Ming Che n, De partm ent of Compute r and Inform a-tion Scie nce , Naa-tional Chiao Tung Unive rsity, Hsinchu, Taiwan, Republic of China.
Cybernetics and Systems: An International Journal, 28:99] 119, 1997
CopyrightQ1997 Taylor & Francis
0196-97 22 ¤97 $12.00 + .00 9 9
rathe r re stricted in applications because these systems cannot process fuzzy queries. Se veral fuzzy information retrie val me thods based on
s .
fuzzy set theory Z adeh, 1965 have be e n proposed for improving the disadvantage of the Boolean logic mode l, such as those in Her and Ke s1983 , Kraft and Bue ll 1983 , Miyamoto 1990 , Murai e t al. 1989 ,. s . s . s .
s . s . s .
Rade chi 1979 , Tahani 1976 , and Z e mankova 1989 . Although the se fuzzy information retrie val me thods have the fuzzy query proce ssing capability, the efficiency and effectivene ss of the se methods are not
s .
satisfactory. Lucare lla and Morara 1991 pre sented a fuzzy information re trieval system FIRST based on conce pt networks. In Chen and W ang s1993, 1995. and W ang and Chen s1993 , we have pre sented some. me thods for de aling with docume nt retrie val using knowle dge-based fuzzy information re trieval te chniques. The me thods we prese nte d in
s . s .
Che n and W ang 1993, 1995a and W ang and Che n 1993 allow the system’s use rs to pe rform simple queries, we ighted que rie s, inte rval que rie s, and we ighte d-inte rval queries. In this pape r, we exte nd the
s . s .
works of Chen and W ang 1993, 1995a , Lucare lla and Morara 1991 ,
s .
and W ang and Che n 1993 to pre se nt a knowle dge-based me thod for dealing with fuzzy que rie s and we ighted-fuzzy que ries for docum ent re trieval, whe re fuzzy conce pt m atrices are use d for knowle dge re pre-se ntation, the eleme nts in a fuzzy concept m atrix repre pre-sent fuzzy re le vant values betwe e n conce pts, and the fuzzy relevant value s be twe e n conce pts are re pre sented by trape zoidal fuzzy numbe rs. The transitive closure of the fuzzy concept matrix is calculated by the fuzzy numbe r arithmetic ope rations to e valuate the im plicit fuzzy re le vant value s be twe e n conce pts. The propose d method is more flexible than the one s
s .
prese nte d in Che n and W ang 1993, 1995a , Lucare lla and Morara s1991 , and W ang and Che n 1993 because it allows the system’s users. s . to pe rform fuzzy queries and we ighte dfuzzy que rie s. Intelligent re -trie val capability and flexible user’ s que rie s are consequently provided for.
BASIC C ONC EPTS OF FUZZY SET THEOR Y
In 1965, Z adeh propose d the the ory of fuzzy sets. In the following, we
s
briefly review some basic de finitions of fuzzy sets from Che n 1992a, b,
. s . s . s c; 1994 , Chen e t al. 1991 , Kande l 1986 , Kaufman and Gupta 1985,
. s .
1988 , and Z adeh 1965 . Le t U be the unive rse of discourse, Us
v u , u , . . . , u . A fuzzy set1 2 n4 A in U is a se t of orde re d pairs
v s u , f1 As u1.. s, u , f2 As u2.., . . . , u , fs n As un..4, where fA is the mem bership
w x s .
function of A, f : UA ª 0, 1 , and fA ui indicate s the grade of me m-be rship of u in A. A fuzzy se t A is convex if and only if for all u , u ini 1 2 U,
s s . . s s . s . . s .
fA l u1q 1 y l u2 0 Min fA u1 , fA u2 1 w x
whe re l g 0, 1 . A fuzzy se t A of the unive rse of discourse U is calle d s .
a norm al fuzzy se t if ’ uig U, f u s 1. A fuzzy number is a fuzzyA i
subset in the universe of discourse of U that is both conve x and normal. A fuzzy num ber M of the universe of discourse U may also be characte rize d by a trape zoidal distribution parame trize d by a quadruple
s a, b, c, d shown in Figure 1..
Le t A and B be two trape zoidal fuzzy numbers, where As
s a , b , c , d1 1 1 1. and Bs a , b , c , d . The trape zoidal fuzzy numbe rss 2 2 2 2.
s .
A and B are called e qual i.e ., As B if and only if a s a , b s b ,1 2 1 2
c1s c , and d s d .2 1 2
Le t A and B be two trape zoidal fuzzy numbers, where As
s a , b , c , d1 1 1 1. and Bs a , b , c , d . B ased on Chens 2 2 2 2. s1992a. and
Figur e 1. A trape zoidal fuzzy numbe r.
s .
Kaufman and Gupta 1985, 1988 , the addition, subtraction, multiplica-tion, division, AND, O R, and ratio ope rations of the trape zoidal fuzzy numbe rs A and B can be define d shown as follows.
Fuzzy numbe r addition
Å
:s . s .
A
Å
Bs a , b , c , d1 1 1 1Å
a , b , c , d2 2 2 2s . s .
s a q a , b q b , c q c , d q d1 2 1 2 1 2 1 2 2 Fuzzy numbe r subtraction ] :
s . s .
A] B s a , b , c , d1 1 1 1 ] a , b , c , d2 2 2 2
s . s .
s a y d , b y c , c y b , d y a1 2 1 2 1 2 1 2 3 Fuzzy numbe r multiplication m :
s . s . Am B s a , b , c , d1 1 1 1 m a , b , c , d2 2 2 2 s . s . s a = a , b = b , c = c , d = d 4
Ç
1 2 1 2 1 2 1 2Ç
Fuzzy numbe r division`
/
:s . s . A`
/
s a , b , c , d `1 1 1 1/
a , b , c , d2 2 2 2 s . s . s ar d , b r c , c r b , d r a 5Ç
1 2 1 2 1 2 1 2Ç
n Fuzzy numbe r ANDD
:n s . n s .
A
D
Bs a , b , c , d1 1 1 1D
a , b , c , d2 2 2 2s s . s . s . s . . s .
s Min a , a , Min b , b1 2 1 2 , Min c , c1 2 , Min d , d1 2 6 k
Fuzzy numbe r O R
D
:k s . k s .
A
D
Bs a , b , c , d1 1 1 1D
a , b , c , d2 2 2 2s s . s . s . s . . s .
s Max a , a , Max b , b1 2 1 2 , Max c , c1 2 , Max d , d1 2 7
% Fuzzy numbe r ratio
D
:% s . % s .
A
D
Bs a , b , c , d1 1 1 1D
a , b , c , d2 2 2 2s . s .
s a r a , b r b , c r c , d r d1 2 1 2 1 2 1 2 8 Le t k be a real numbe r be twe en ze ro and one and A be a
s .
trape zoidal fuzzy number, As a , b , c , d . The n, we can see that1 1 1 1
s . s .
km A s k , k , k , k m a , b , c , d1 1 1 1
s . s .
s k= a , k = b , k = c , k = d1 1 1 1 9 In the following, we introduce a de fuzzification technique for trape
-s .
zoidal fuzzy numbers Kaufmann & Gupta, 1988; Che n, 1994a . Le t us conside r the trape zoidal fuzzy numbe r shown in Figure 2, whe re e is a defuzzification value of the trape zoidal fuzzy number. From Figure 2, we can se e that 1 1 s ey b 1 q. s . 2s by a 1 s c y e 1 q. s . s . s . 2s dy c 1. s . 1 1 s . s . s . s . « ey b q 2 by a s c y e q 2 dy c 1 1 s . s . s . s . « ey b y c y e s 2 dy c y 2 by a aq d y b y c 2 bq 2 c « 2 es q 2 2 aq b q c q d « 2 es 2 aq b q c q d s . « es 10 4 s .
In the following, we pre sent a similarity measure Che n, 1995b for me asuring the de gre e of similarity be twe e n two trape zoidal fuzzy
Figu re 2. De fuzzification of a trape zoidal fuzzy num be r.
be rs. Le t A and B be two trapezoidal fuzzy numbe rs, where As s a , b , c , d1 1 1 1.and Bs a , b , c , d . The de gree of similarity be twee ns 2 2 2 2. the trape zoidal fuzzy num bers A and B can be m easured by the sim ilarity function S,
<a1y a q b y b q c y c q d y d2< < 1 2< < 1 2< < 1 2<
s . s .
S A , B s 1 y 11
4
s . w x s .
whe re S A, B g 0, 1 . The larger the value of S A, B , the greater the sim ilarity betwe en the trape zoidal fuzzy numbers A and B. It is obvious
s . s .
that if As 1, 1, 1, 1 and B s 0, 0, 0, 0 , then <1y 0 q 1 y 0 q 1 y 0 q 1 y 0< < < < < < <
s . s .
S A , B s 1 y s 0 12
4
s . s
Furthe rmore, if As B s a , b , c , d1 1 1 1 i.e., A and B are identical .
trape zoidal fuzzy numbers , then
<a1y a q b y b q c y c q d y d1< < 1 1< < 1 1< < 1 1<
s . s .
S A , B s 1 y s 1 13
4
Le t x and y be two re al values be twe e n zero and one. It is obvious that x and y can be repre sented by trape zoidal fuzzy numbe rs, that is,
s . s . s .
xs x, x, x, x and y s y, y, y, y . B ased on formula 11 , the de gree of similarity betwe en x and y can be e valuated shown as follows:
<xy y q x y y q x y y q x y y< < < < < < <
s . < < s .
S x, y s 1 y s 1 y x y y 14
4
s .
This result is coincide nt with the one we show in Chen et al. 1989 .
C ONC EPT NETWOR KS AND C ONC EPTS MATR IC ES
s .
Lucare lla and Morara 1991 prese nte d conce pt ne tworks for fuzzy information re trieval. A concept ne twork include s node s and dire cted links, whe re each node repre se nts a conce pt or a document and e ach dire cted link conne cts two concepts or is dire cte d from one conce pt Ci
to one docum ent d and is labe led with a re al value betwe en ze ro andm j
one. If Ciª C , the n it indicates that the de gree of rele vance fromj m
w x
conce pt C to conce pt C isi j m , where m g 0, 1 . If Ciª d , the n itj
indicates that the de gre e of relevance of document d with re spe ct toj
w x
conce pt C isi m , where m g 0, 1 . Figure 3 shows a concept network
s .
adapte d from Lucarella and Morara 1991 , whe re C , C , . . . , and C1 2 7 are conce pts; d , d , d , and d are docume nts.1 2 3 4
s . s .
In Che n and W ang 1993, 1995 and W ang and Che n 1993 , we
s .
have e xtende d the work of Lucarella and Morara 1991 to allow the
Figure 3. A conce pt ne twork.
dire cted links in a concept network to be associate d with real intervals w x
in 0, 1 . Howe ve r, if we can allow the directe d links in a conce pt network to be associate d with linguistic te rms or trapezoidal fuzzy
s .
numbe rs parame terized by a, b, c, d , whe re 0( a ( b ( c ( d ( 1, the n the re is room for more fle xibility. Thus, in this pape r, we furthe r
s .
e xtend the works of Chen and W ang 1993, 1995a , Lucare lla and
s . s .
Morara 1991 , and W ang and Che n 1993 to allow the dire cte d links in a conce pt ne twork to be associate d with linguistic te rms or trape zoidal fuzzy numbe rs. The se t of linguistic term s we use d in this pape r and the ir corre sponding trape zoidal fuzzy numbe rs are shown in Table 1.
s . s .
In Che n and W ang 1993, 1995a and W ang and Che n 1993 , we have prese nte d the definitions of conce pt matrice s for modeling conce pt networks. The de finitions of concept matrice s and the transitive closure of the conce pt m atrices are reviewe d as follows.
v 4
Definition 1: Le t C be a se t of concepts, Cs C , C , . . . , C . A1 2 n
s . s .
conce pt matrix M is a fuzzy m atrix Kande l, 1986 ; M C , Ci j re pre se nts
s .
the re le vant value from conce pt C to concept C , where M C , Ci j i j g w0, 1 .x
A concept matrix M has the following propertie s:
1. Refle xivity,
s .
M C , Ci i s 1 , ; Cig C
Table 1. Linguistic terms and the ir corre sponding trape zoidal fuzzy numbe rs
Linguistic terms Trape zoidal fuzzy numbe rs
s . Nonre levant 0, 0, 0, 0 s . V ery low 0, 0, 0.02, 0.07 s . Low 0.04, 0.1, 0.18, 0.23 s . Me dium low 0.17, 0.22, 0.36, 0.42 s . Me dium 0.32, 0.42, 0.58, 0.65 s . Me dium high 0.58, 0.63, 0.80, 0.86 s . H igh 0.72, 0.78, 0.92, 0.97 s . V ery high 0.975, 0.98, 1, 1 s . Fully re levant 1, 1, 1, 1
2. M may not be symme tric, s . s . M C , Ci j / M C , Cj i 3. Transitivity, s . s . s . M C , Ci k 0 Max Min M C , C , M C , C
s
i j j k.
Cjg CDefinition 2: Le t M be a concept matrix,
f11 f12 ? ? ? f1 n f21 f22 ? ? ? f2 n ? ? ? ? ? ? Ms ? ? ? ? ? ? ? ? ? ? ? ? fn1 fn 2 ? ? ? fn n w x whe re n is the numbe r of conce pts in a conce pt ne twork, fi jg 0, 1 , 1( i ( n , 1 ( j ( n , and le t M2s Mª M s f n f . s f n f . ? ? ? s f n f .
E
1i i1E
1 i i2E
1 i i n is 1 , . . . , n is 1 , . . . , n is 1 , . . . , n s f n f . s f n f . ? ? ? s f n f .E
2 i i1E
2 i i2E
2 i i n is 1 , . . . , n is 1 , . . . , n is 1 , . . . , n s ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? s f n f . s f n f . ? ? ? s f n f .E
n i i1E
n i i 2E
n i i n is 1 , . . . , n is 1 , . . . , n is 1 , . . . , n s 15. whe re k and n are the maximum and minimum operators, respe c-tive ly. The n, the re e xists an integer p( n y 1, such that MPs MPq 1Pq 2 s . P
s M s ? ? ? please se e Kandel, 1986, p. 117 . Le t Qs M . Q is called the transitive closure of the conce pt matrix M.
In this paper, we allow the directe d links in a concept network to be associated with linguistic te rms shown in Table 1 or trapezoidal fuzzy
s .
numbe rs parame terized by a, b, c, d , whe re 0( a ( b ( c ( d ( 1,
and we use fuzzy conce pt matrice s to modeling the concept networks. The definition of fuzzy conce pt matrice s is pre sented as follows.
v 4
Definition 3: Le t C be a set of conce pts, Cs C , C , . . . , C , and F1 2 n
s . s .
be a fuzzy concept m atrix. F C , Ci j s a , b , c , di j i j i j i j indicate s that the fuzzy re le vant value from conce pt C to conce pt C is repre sented byi j
s .
trape zoidal fuzzy numbe r a , b , c , di j i j i j i j , where 0( a ( b ( c (i j i j i j di j( 1.
Definition 4: Le t F be a fuzzy conce pt matrix,
A11 A12 ? ? ? A1 n A21 A22 ? ? ? A2 n ? ? ? ? ? ? Fs ? ? ? ? ? ? ? ? ? ? ? ? An 1 An 2 ? ? ? An n
whe re n is the number of concepts, Ai j are trapezoidal fuzzy numbe rs,
s . Ai js a , b , c , di j i j i j i j , 0( a ( b ( c ( d ( 1, 1 ( i ( n , and 1 (i j i j i j i j j( n , and let F2 s F( F k
s
A1 in Ai1.
ks
A1 in Ai 2.
? ? ? ks
A1 in Ai n.
D
D
D
D
D
D
is 1 , . . . , n is 1 , . . . , n is 1 , . . . , n ks
A2 in Ai1.
ks
A2 in Ai 2.
? ? ? ks
A2 in Ai n.
D
D
D
D
D
D
is 1 , . . . , n is 1 , . . . , n is 1 , . . . , n ? ? ? ? ? ? s ? ? ? ? ? ? ? ? ? ? ? ? ks
An in Ai1.
ks
An in Ai 2.
? ? ? ks
An i n Ai n.
D
D
D
D
D
D
is 1 , . . . , n is 1 , . . . , n is 1 , . . . , n s 16.n k
whe re
D
andD
re prese nt the AND and O R ope rators of the trape zoidal fuzzy numbe rs, respectively. The n the re exists an integer p,p( n y 1, such that FPs FPq 1s FPq 2s ? ? ?
. Le t Qs FP
. Q is called the transitive closure of the fuzzy conce pt matrix F .
FUZZY QUER Y PR OC ESSING TEC HNIQUES FOR D OC UMENT R ETR IEVAL
v 4
Le t D be a set of docume nts, Ds d , d , . . . , d1 2 m , and C be a se t of
v 4
conce pts, Cs C , C , . . . , C . A document in a docume nt re trieval1 2 n
system is gene rally described by a se t of concepts with each conce pt re pre senting a topic. The relations betwe en documents and conce pts can be repre sented by a document descriptor matrix D shown as follows: C1 C2 ? ? ? Cn d1 B11 B12 ? ? ? B1 n d2 B21 B22 ? ? ? B2 n Ds ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? dm Bm 1 Bm 2 ? ? ? Bm n
whe re m is the number of docume nts, n is the numbe r of conce pts, Bi j
is a trapezoidal fuzzy num ber param eterized by a quadruple re prese nt-ing the de gree of relevance of docume nt d with re spe ct to concept C ,i j
1( i ( m , and 1 ( j ( n . In a docume nt de scriptor m atrix D, the degree of relevance of each docume nt with respe ct to a spe cific conce pt is de te rmine d by e xperts. Howe ver, an e xpe rt may possibly ne gle ct the degree of relevance of a ce rtain docume nt with respe ct to some specific conce pts. Be cause concepts may be not indepe nde nt of each othe r, the transitive closure Q of the fuzzy concept matrix F can be use d to e valuate the implicit re le vant value s of e ach docume nt with respe ct to specific concepts to improve this. Le t D be a document de scriptor matrix and Q be the transitive closure of the fuzzy concept matrix F ,
whe re C1 C2 ? ? ? Cn d1 B11 B12 ? ? ? B1 n d2 B21 B22 ? ? ? B2 n Ds ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? dm Bm 1 Bm 2 ? ? ? Bm n S11 S12 ? ? ? S1 n S21 S22 ? ? ? S2 n ? ? ? ? ? ? Qs ? ? ? ? ? ? ? ? ? ? ? ? Sn1 Sn 2 ? ? ? Sn n
whe re Bi j and Si j are trape zoidal fuzzy num bers parametrized by quadruples, 1( i ( n and 1 ( j ( n. Le t DUs D( Q k
s
B1 in Si1.
ks
B1 in Si 2.
? ? ? ks
B1 in Si n.
D
D
D
D
D
D
is 1 , . . . , n is 1 , . . . , n is 1 , . . . , n ks
B2 in Si1.
ks
B2 in Si 2.
? ? ? ks
B2 in Si n.
D
D
D
D
D
D
is 1 , . . . , n is 1 , . . . , n is 1 , . . . , n s ? ? ? ? ? ? , ? ? ? ? ? ? ? ? ? ? ? ? ks
Bn in Si1.
ks
Bn in Si 2.
? ? ? ks
Bn in Si n.
D
D
D
D
D
D
is 1 , . . . , n is 1 , . . . , n is 1 , . . . , n s 17. n kwhe re
D
andD
are the AND and O R operators of the trape zoidal fuzzy numbe rs, re spectively. The docume nt de scriptor matrix DU indi-cate s the degree s of re le vance of e ach docume nt with re spect to specific conce pts and is used as a basis for sim ilarity me asure s be twe e n user’ s que rie s and docume nts as describe d late r.In a fuzzy information retrie val system , the use r’ s que ry can be describe d by a query descriptor Q re prese nte d by a que ry de scriptor matrix q, that is,
v s . s . s . 4
Qs C , V , C , V1 1 2 2 , . . . , C , Vn n
w x
qs V , V , . . . , V1 2 n
whe re V is a trapezoidal fuzzy numbe r paramete rize d by a quadruple,i
1( i ( n , repre se nting the degree of strength that the desired docu-me nts contain conce pt C . If the use r considers that certain conce ptsi
may be ne glecte d, then the user doe s not have to assign the degree s of strength with re spe ct to such concepts in the que ry descriptor vector q. The symboly is used for labeling a ne gle cted conce pt. Thus, if V s y ,i it indicate s that conce pt C is a ne gle cted concept. In this case , thei
conce pt C would not be considere d in the docume nt re trie val process.i U Le t d de note the ith row of the docum ent de scriptor matrix D ,i
w x
dis P , P , . . . , P , and let q be the que ry de scriptor matrix, q si1 i2 i n
wV , V , . . . , V , whe re P and V are trape zoidal fuzzy numbe rs,1 2 nx i j j
s .
Pi js a , b , c , di j i j i j i j
s .
Vjs w , x , y , zj j j j
1( j ( n , 1 ( i ( m , n is the numbe r of concepts, and m is the s .
numbe r of docume nts. Le t q j denote the jth compone nt of the query s .
descriptor matrix q. If q j s y , it indicate s that the conce pt C is aj
negle cted concept with respect to the fuzzy query. B ase d on formula s11 , the degree of similarity betwe en d and q can be evaluated as. i
follows: s . S P , V
p
i j j s . q j/ ``y ’ ’ an d js 1 , . . . , n s . s . RS di s 18 k whe re <ai jy w q b y x q c y y q d y zj< < i j j< < i j j< < i j j< s . s . S P , Vi j j s 1 y 19 4s . w x
RS di g 0, 1 , 1( i ( m , and k is the number of concepts not ne -s .
glecte d in the que ry. The re trieval status value RS di indicate s the degree of sim ilarity be twe e n the query and the docume nt d , wherei
s .
1( i ( m . The larger the value of RS d , the highe r the similarityi
be twe e n the query and the document d .i
Conside r the following O R-conne cted query:
q O R q1 2
whe re q and q are query descriptor matrices. In this case, the de gre e1 2 of similarity betwe en the query and the documents can be e valuate d as follows:
Us . s s . s . . s .
RS di s Max RS d , RS d1 i 2 i 20
s .
whe re RS d1 i re pre se nts the de gre e of sim ilarity be twe e n the que ry descriptor matrix q and the i th row of the document de scriptor matrix1
U s .
D , RS d2 i re prese nts the de gre e of similarity be twe e n the query
descriptor matrix q and the ith row of the docume nt descriptor matrix2
U Us .
D , the retrie val status value RS di re pre sents the degree of
similar-ity of the que ry with respe ct to the document d , and 1i ( i ( m . The information re trieval system would display e very docume nt having a w x re trieval status value gre ate r than a thre shold value l , whe re l g 0, 1 , in a se que ntial orde r from the docume nt with the highest de gre e of re trieval status value to that with the lowe st one .
W eighted-fuzzy que rie s can also be proce ssed by our method. In we ighte d-fuzzy que rie s, a que ry e xpression can be re pre sented by a que ry descriptor matrix q shown as follows:
ws . s . s .x
qs V , W , V , W , . . . , V , W1 1 2 2 n n
whe re V and W are trapezoidal fuzzy numbe rs parameterize d byj j
quadruples, W re prese nts the we ights of conce pt C , and 1j j ( j ( n. Le t
U w x
the ith row of the docume nt de scriptor matrix D be P , P , . . . , Pi1 i2 i n , whe re P , P , . . . , and Pi1 i2 i n are trapezoidal fuzzy numbe rs parame te
ized by quadruple s. Then the de gre e of sim ilarity be twe en the we ighte d fuzzy que ry and the docum ent d can be calculated as follows:i
s . s . RSw di s
t
p
S P , Vi j j m Wj/
s . q j/ ``y ’ ’ an d js 1 , . . . , n %p
Wj s 21.D
t
/
s . Q j/ ``y ’ ’ an d js 1 , . . . , n %whe re m and
D
are the multiplication ope rator and the ratio opera-tor of the trape zoidal fuzzy numbe rs, re spe ctively, and p de notes the summation of the trapezoidal fuzzy numbe rs. The re trie val status values .
RSw di is a trape zoidal fuzzy numbe r indicating the degre e of similarity be twe e n the we ighte d-fuzzy que ry and the docume nt d , whe re 1i ( i (
m . Assume that s . s . RSw d1 s a , b , c , d1 1 1 1 s . s . RSw d2 s a , b , c , d2 2 2 2 . . . s 22. s . s . RSw dm s a , b , c , dm m m n s . s .
B ase d on formula 10 , the retrie val status value RSw di can be defuzzified into a crisp real value, where 1( i ( m . In this case, the
s . s .
defuzzified value of RSw di is e qual to aiq b q c q d r 4, wherei i i s .
1( i ( m . Let the de fuzzified value of RS dw i be e qual to k , wherei
w x w x
k29 g 0, 1 and 1( i ( m , and let l be a thre shold value , l g 0, 1 . The information re trieval system would display e very docume nt having a defuzzified re trie val status value k gre ate r than the threshold valuei l
in a se que ntial orde r from the docume nt with the highest de gre e of defuzzified re trie val status value to that with the lowe st one.
In the following, we use an example to illustrate the we ighted-fuzzy que ry processing process for docume nt re trieval.
Example: Assume that the re trieval threshold value l is 0.65, and
assum e that the re are four concepts C , C , C , C and five docume nts1 2 3 4
d , d , d , d , and d . Furthe rmore, assume that the docume nt de scrip-1 2 3 4 5 tor matrix D and the fuzzy conce pt matrix F have the following forms:
C1 C2 C3 C4 s0 .2, 0.3, 0.4, 0 .5. s0 .5, 0.6, 0.7, 0 .8. s1 , 1 , 1, 1. s0 , 0 , 0 , 0. d1 s1 , 1 , 1, 1. s0 .6, 0.7, 0.8, 0 .9. s0 .3, 0.4, 0.5, 0 .6. s0.5, 0.6, 0 .7 , 0 .8. d2 s . s . s . s . d 0 .5, 0.6, 0.7, 0 .8 1 , 1 , 1, 1 0 .1, 0.2, 0.3, 0 .4 0.7, 0.7, 0 .7 , 0 .7 Ds 3 s . s . s . s . d4 0 , 0 , 0, 0 0 .3, 0.4, 0.5, 0 .6 0 .5, 0.6, 0.7, 0 .8 1 , 1 , 1 , 1 d5 s0 .3, 0.4, 0.5, 0 .6. s0 .4, 0.5, 0.6, 0 .7. s0 , 0 , 0, 0. s0.5, 0.6, 0 .7 , 0 .8. C1 C2 C3 C4 s1, 1 , 1 , 1. s0 .975, 0 .98, 1, 1. s0 , 0 , 0, 0. s0, 0 , 0 , 0. C1 s0, 0 , 0 , 0. s1 , 1 , 1 , 1. s0.58 , 0 .63, 0 .80, 0 .86. s0 .975, 0 .98, 1, 1. C2 Fs s . s . s . s . C3 0, 0 , 0 , 0 0 , 0 , 0 , 0 1 , 1 , 1, 1 0, 0 , 0 , 0 C4 s0, 0 , 0 , 0. s0 , 0 , 0 , 0. s0 , 0 , 0, 0. s1, 1 , 1 , 1.
In this case , the transitive closure Q of the fuzzy concept matrix F can be calculate d as follows: C1 C2 C3 C4 s1, 1 , 1 , 1. s0 .975, 0 .98, 1, 1. s0.58 , 0 .63, 0 .80, 0 .86. s0 .975 , 0 .98, 1 , 1. C1 s . s . s . s . 0, 0, 0 , 0 1 , 1 , 1 , 1 0.58 , 0 .63, 0 .80, 0 .86 0 .975 , 0 .98, 1 , 1 C Qs 2 s . s . s . s . 0, 0 , 0 , 0 0 , 0 , 0 , 0 1 , 1 , 1 , 1 0, 0, 0, 0 C3 s0, 0 , 0 , 0. s0 , 0 , 0 , 0. s0 , 0 , 0 , 0. s1, 1, 1, 1. C4
The docume nt de scriptor matrix DU can be obtained based on the docume nt de scriptor m atrix D and the transitive closure Q of the fuzzy conce pt m atrix F as follows:
DU s D( Q C1 C2 C3 C4 s0 .2 , 0 .3 , 0 .4 , 0 .5. s0 .5 , 0 .6 , 0 .7 , 0 .8. s1 , 1 , 1 , 1. s0 .5 , 0 .6 , 0 .7 , 0 .8. d1 s1 , 1 , 1 , 1. s0 .97 5 , 0 .8 , 1 , 1. s0 .58 , 0 .6 3 , 0 .8 0 , 0 .86. s0 .9 7 5 , 0 .98 , 1 , 1. d2 d s . s . s . s . s 3 0 .5 , 0 .6 , 0 .7 , 0 .8 1 , 1 , 1 , 1 0 .58 , 0 .6 3 , 0 .8 0 , 0 .86 0 .9 7 5 , 0 .98 , 1 , 1 d4 s0 , 0 , 0 , 0. s0 .3 , 0 .4 , 0 .5 , 0 .6. s0 .5 , 0 .6 , 0 .7 , 0 .8. s1 , 1 , 1 , 1. d5 s0 .3 , 0 .4 , 0 .5 , 0 .6. s0 .4 , 0 .5 , 0 .6 , 0 .7. s0 .4 , 0 .5 , 0 .6 , 0 .7. s0 .5 , 0 .6 , 0 .7 , 0 .8.
If the use r’s we ighted-fuzzy query is repre sented by the query descriptor matrix q shown as follows:
ws s . s . .
qs 0 .6 , 0 .7 , 0 .8 , 0 .9 , 0 .6 , 0 .7 , 0 .8 , 0 .9 ,y , y , s s0 .9 , 0 .95, 0 .95 , 1 , 0 .5 , 0 .6, 0 .7 , 0 .8. s . .x
s .
The n based on formula 21 , we can ge t the following results: <0 .2y 0 .6 q 0 .3 y 0 .7 q 0 .4 y 0 .8 q 0 .5 y 0 .9< < < < < < < s . RSw d1 s
t
1y/
4 s . m 0 .6 , 0 .7 , 0 .8 , 0 .9 <0 .5y 0 .9 q 0 .6 y 0 .95 q 0 .7 y 0 .95 q 0 .8 y 1< < < < < < <Å
t
1y/
4 s . m 0 .5 , 0 .6 , 0 .7 , 0 .8 % ws 0 .6, 0 .7 , 0 .8 , 0 .9.Å
s 0 .5 , 0 .6 , 0 .7 , 0 .8.xD
s . s 0 .64545 , 0 .64615 , 0 .64667, 0 .64706 <1y 0 .6 q 1 y 0 .7 q 1 y 0 .8 q 1 y 0 .9< < < < < < < s . RSw d2 st
1y/
4 s . m 0 .6 , 0 .7 , 0 .8 , 0 .9 <0 .975y 0 .9 q 0 .98 y 0 .95 q 1 y 0 .95 q 1 y 1< < < < < < <Å
t
1y/
4 s . m 0 .5 , 0 .6 , 0 .7 , 0 .8 % ws 0 .6 , 0 .7 , 0 .8 , 0 .9.Å
s 0 .5 , 0 .6 , 0 .7 , 0 .8.xD
s . s 0 .84603, 0 .8475, 0 .84859, 0 .84941<0 .5y 0 .6 q 0 .6 y 0 .7 q 0 .7 y 0 .8 q 0 .8 y 0 .9< < < < < < < s . RSw d3 s
t
1y/
4 s . m 0 .6, 0 .7 , 0 .8 , 0 .9 <0 .975y 0 .9 q 0 .98 y 0 .95 q 1 y 0 .95 q 1 y 1< < < < < < <Å
t
1y/
4 s . m 0 .5, 0 .6 , 0 .7 , 0 .8 % ws0 .6 , 0 .7 , 0 .8 , 0 .9.Å
s0 .5 , 0 .6, 0 .7 , 0 .8.xD
s . s 0 .92785 , 0 .92827, 0 .92859, 0 .92882 <0y 0 .6 q 0 y 0 .7 q 0 y 0 .8 q 0 y 0 .9< < < < < < < s . RSw d4 st
1y/
4 s . m 0 .6 , 0 .7 , 0 .8 , 0 .9 <1y 0 .9 q 1 y 0 .95 q 1 y 0 .95 q 1 y 1< < < < < < <Å
t
1y/
4 s . m 0 .5 , 0 .6 , 0 .7 , 0 .8 % ws 0 .6 , 0 .7, 0 .8 , 0 .9.Å
s 0 .5 , 0 .6 , 0 .7 , 0 .8.xD
s . s 0 .56818 , 0 .57308, 0 .57667, 0 .57941 <0 .3y 0 .6 q 0 .4 y 0 .7 q 0 .5 y 0 .8 q 0 .6 y 0 .9< < < < < < < s . RSw d5 st
1y/
4 s . m 0 .6 , 0 .7 , 0 .8 , 0 .9 <0 .5y 0 .9 q 0 .6 y 0 .95 q 0 .7 y 0 .95 q 0 .8 y 1< < < < < < <Å
t
1y/
4% s . ws . s .x m 0 .5 , 0 .6 , 0 .7 , 0 .8
D
0 .6 , 0 .7 , 0 .8 , 0 .9Å
0 .5 , 0 .6 , 0 .7 , 0 .8 s . s 0 .7 , 0 .7 , 0 .7 , 0 .7 s .Based on formula 10 , we can ge t the following re sults: s .
The defuzzified value of RSw d1 is e qual to 0 .64545q 0 .64615 q 0 .64667 q 0 .64706
s 0 .64333
Ç
4
Ç
s .
The defuzzified value of RSw d2 is e qual to 0 .84603q 0 .8475 q 0 .84859 q 0 .84941
s 0 .84788
Ç
4
Ç
s .
The defuzzified value of RSw d3 is equal to 0 .92785q 0 .92827 q 0 .92859 q 0 .92882
s 0 .92838
Ç
4
Ç
s .
The defuzzified value of RSw d4 is equal to 0 .56818q 0 .57308 q 0 .57667 q 0 .57941
s 0 .57434 .
Ç
4
Ç
s .
The defuzzified value of RSw d5 is e qual to 0 .7q 0 .7 q 0 .7 q 0 .7
s 0 .7 . 4
Be cause the retrie val thre shold value l is 0.65, the docum ents d1 and d4 will not be re trie ve d be cause the re trieval status value s of the docume nts d1 and d4 are le ss than the threshold value . From the se re sults, we also can see that the docume nt d is the most suitable to the3 user’ s we ighte d-fuzzy query be cause it has the largest retrie val status value.
C ONC LUSIONS
In this pape r, we have pre sented a knowle dge -based me thod for pro-ce ssing fuzzy que rie s and we ighted-fuzzy queries for docume nt re trieval,
whe re the fuzzy concept m atrices are use d for knowle dge repre sentation and the e le ments in fuzzy concept m atrices are repre sented by trape
-s .
zoidal fuzzy num bers parame te rize d by quadruples a, b, c, d , whe re
0( a ( b ( c ( d ( 1. W e also use an e xample to illustrate the we ighte d-fuzzy query proce ssing process for docum ent re trieval. From the illustrated e xample , we can se e that the propose d method can be e xe cute d very e fficie ntly. The proposed method is a significant im prove-me nt ove r the m ethod based on B oolean algebra because it has the fuzzy query proce ssing capability. Furtherm ore , the proposed me thod is
s .
more flexible than the one s pre sented in Chen and W ang 1993, 1995a ,
s . s .
Lucare lla and Morara 1991 , and W ang and Chen 1993 because it allows the system’ s use rs to pe rform fuzzy queries and we ighted-fuzzy que rie s. Inte llige nt retrie val capability and fle xible use r’s que rie s are consequently provided for.
R EFER ENC ES
Che n, S. M. 1992a. A fuzzy re asoning technique base d on the a -cuts ope rations of fuzzy num bers. Proceed ings of the Secon d International Con feren ce on
Au tom ation Tech nology, Taipe i, Taiwan, Re public of China, pp. 147] 154.
Che n, S. M. 1992b. An improve d algorithm for inexact re asoning base d on s .
extende d fuzzy production rule s. Cybern. Syst. 23 5 :463] 481.
Che n, S. M. 1992c. A ne w approach to ine xact re asoning for rule-base d syste ms. s .
Cybern. Syst. 23 6 :561] 582.
Che n, S. M. 1993. An inexact reasoning te chnique using linguistic rule matrix transformations. Proceedin gs of the IEE E 23rd In ternational Sym posiu m on
Multiple-Valued Logic, Sacramento, CA, pp. 190] 195.
Che n, S. M. 1994a. Using fuzzy reasoning technique s for fault diagnosis of the J-85 je t e ngines. Proceedings of the Third National Conference of Scien ce and Tech nology of National Defense, Taipe i, Taiwan, Republic of China, pp. 29] 34.
Che n, S. M. 1994b. A ne w me thod for handling multicriteria fuzzy decision-mak-s .
ing proble ms. Cybern. Syst. 25 3 :409] 420.
Che n, S. M. and J. Y. Wang. 1993. A new approach for fuzzy information retrie val. Proceed ings of 1993 National Com puter Sym posiu m , Chiayi, Taiwan, Re public of China, vol. 2, pp. 767] 774.
Che n, S. M. and J. Y. Wang. 1995a. Docum ent re trieval using knowledge -based fuzzy information retrie val technique s. IEEE Trans. Syst. Man Cybern.
s .
25 5 :793] 803.
Che n, S. M. and S. Y. Lin. 1995b. A ne w method for fuzzy risk analysis. Proceed ings of 1995 Artificial Intelligence Workshop, Taipe i, Taiwan, Republic of China, pp. 245] 250.
Che n, S. M., J. S. Ke , and J. F. Chang. 1989. Technique s for handling multicrite -ria fuzzy de cision-making proble ms. Proceed ings of the 4th Internation al Sym posiu m on Com pu ter and In form ation Scien ces, Cesme , Turke y, vol. 2, pp. 919] 925.
Che n, S. M., J. S. Ke , and J. F. Chang. 1991. An inexact reasoning algorithm for de aling with ine xact knowle dge. Int. J. Software Eng. Knowledge Eng.
s .
1 3 :227] 241.
Her, G. T. and J. S. Ke . 1983. A fuzzy information re trie val syste m mode l. Proceed ings of 1983 National Com puter Sym posium , Taiwan, Re public of China, pp. 147] 155.
Kame l, M., B. Hadfield, and M. Ismail. 1990. Fuzzy que ry proce ssing using s .
cluste ring technique s. Inform . Process. Manage. 26 2 :279] 293.
Kande l, A. 1986. Fuzzy m athem atical techniques with applications. Reading, MA: Addison-We sle y.
Kaufman, A. and M. M. Gupta. 1985. Introduction to fu zzy arithm etic: Theory and application s. New York: V an Nostrand Re inhold.
Kaufman, A. and M. M. Gupta. 1988. Fu zzy m athem atical m odels in engineering and m an agem en t science. Amste rdam: North-Holland.
Kraft, D. H. and D. A. Buell. 1983. Fuzzy se ts and generalize d Boolean retrie val s .
syste ms. In t. J. Man Machine Stud. 19 1 :45] 56.
Lucare lla, D. and R. Morara. 1991. FIRST: Fuzzy information retrieval system.
J. Inform. Sci. 17:81] 91.
Miyamoto, S. 1990. Information re trieval based on fuzzy associations. Fuzzy Sets
Syst. 38:191] 205.
Murai, T., M. Miyakoshi, and M. Shimbo. 1989. A fuzzy docum ent retrie val me thod base d on two-valued inde xing. Fuzzy Sets Syst. 30:103] 120.
Radechi, T. 1979. Fuzzy set theore tical approach to docum ent retrie val. In form.
Process. Manage. 15:247] 259.
Salton, G. and M. J. McGill. 1983. In troduction to m odern inform ation retrie¨al. Ne w York: McGraw-Hill.
Tahani, V . 1976. A fuzzy mode l of docum ent retrie val syste m. Inform . Process.
Manage. 12:177] 187.
Wang, J. Y. and S. M. Che n, 1993. A knowle dge-base d me thod for fuzzy information re trieval. Proceed ings of the First Asian Fuzzy System s Sym po-siu m , Singapore, Nove mber.
Z adeh, L. A. 1965. Fuzzy sets. Inform . Con trol 8:338] 353.
Z emankova, M. 1989. FIIS: A fuzzy inte lligent information system. Data Eng. s .
12 2 :11] 20.