Fuzzy Rough Sets with Hierarchical Quantitative Attributes

(1)

Fuzzy Rough Sets with Hierarchical Quantitative

Attributes

Tzung-Pei Hong1*, Yan-Liang Liou2 and Shyue-Liang Wang3

1

Department of Computer Science and Information Engineering

2

Department of Electrical Engineering

3

Department of Information Management National University of Kaohsiung

Kaohsiung, 811, Taiwan, R.O.C.

[email protected], [email protected], [email protected]

Abstract

Machine learning can extract desired knowledge and ease the development bottleneck in building expert systems. Among the proposed approaches, deriving classification rules from training examples is the most common. Given a set of examples, a learning program tries to induce rules that describe each class. The rough-set theory has served as a good mathematical tool for dealing with data classification problems. It adopts the concept of equivalence classes to partition training instances according to some criteria. In the past, we thus proposed a fuzzy-rough approach to produce a set of certain and possible rules from quantitative data. Attributes are, however, usually organized into hierarchy in real applications. This paper thus extends our previous approach to deal with the problem of producing a set of cross-level maximally general fuzzy certain and possible rules from examples with hierarchical and quantitative attributes. The proposed approach combines the rough-set theory and the fuzzy-set theory to learn. It is more complex than learning from single-level values, but may derive more general knowledge from data. Fuzzy boundary approximations, instead of upper approximations, are used to find possible rules, thus reducing some subsumption checking. Some pruning heuristics are adopted in the proposed algorithm to avoid unnecessary search. A simple example is also given to illustrate the proposed approach.

Keywords: machine learning, rough set, certain rule, possible rule, hierarchical value,

quantitative value.

---

*Corresponding author. Also at Department of Computer Science and Engineering, National Sun Yat-sen University, Taiwan.

(2)

1. Introduction

Expert systems have been widely used in domains where mathematical models cannot easily be built, human experts are not available or the cost of querying an expert is high. Although a wide variety of expert systems have been built, knowledge acquisition remains a development bottleneck. Usually, a knowledge engineer is needed to establish a dialog with a human expert and to encode the knowledge elicited into a knowledge base to produce an expert system. The process is, however, very time-consuming [1][17]. Building a large-scale expert system involves creating and extending a large knowledge base over the course of many months or years. Shortening the development time is thus the most important factor for the success of an expert system. Machine-learning techniques have thus been developed to ease the knowledge-acquisition bottleneck. Among the proposed approaches, deriving rules from training examples is the most common [7][11][12]. Given a set of examples, a learning program tries to induce rules that describe each class.

Recently, the rough-set theory has been used in reasoning and knowledge acquisition for expert systems [2][5][14]. It was proposed by Pawlak in 1982 [15][16] with the concept of equivalence classes as its basic principle. Several applications and extensions of the rough-set theory have also been proposed. Examples are Orlowska's reasoning with incomplete information [14], Germano and Alexandre's knowledge-base reduction [4], Lingras and Yao's data mining [10]. Lambert-Torres et al. found unimportant attributes from lower and upper approximations and deleted them from a database [8]. Zhong et al. proposed a new incremental learning algorithm based on the generalization distribution table, which maintained the probabilistic relationships between the possible instances and the possible concepts [23]. Yao formed a stratified granulation structure with respect to different levels of rough set approximations by incrementally clustering objects with the same characteristics together [19]. Also, Lee et al. simplified classification rules for data mining

(3)

using rough set theory [9]. Tsumoto presented a knowledge discovery system based on rough sets and attribute-oriented generalization [18]. It was used not only to acquire several sets of attributes important for classification, but also to evaluate how precisely the attributes of a database were able to classify data. Many researches about this field are still in progress [21][22].

Training data in real-world applications sometimes consist of quantitative values. Fuzzy-set concepts are often used to represent quantitative data expressed in linguistic terms and membership functions in intelligent systems because of its simplicity and similarity to human reasoning. Dubois and Prade combined rough sets and fuzzy sets together in order to get a more accurate account of imperfect information [3]. They built up a very good theoretic basis for fuzzy rough sets. Also, Nakamura predefined similarity matrices and used fuzzy rough sets to logic reasoning [13]. In the past, we also proposed a method which combined rough-set theory and fuzzy-set theory to deal with the problem of producing a set of certain and possible rules from quantitative data [6].

Attributes are usually organized into hierarchy in real applications. Deriving rules on multiple concept levels may thus lead to the discovery of more general and important knowledge from data. It is, however, more complex than learning rules from training examples with single-level values. This paper thus extends our previous approach to deal with the problem of producing a set of cross-level maximally general fuzzy certain and possible rules from examples with hierarchical and quantitative attributes. Fuzzy boundary approximations, instead of upper approximations, are used to find possible rules, thus reducing some subsumption checking. Some pruning heuristics are adopted in the proposed algorithm to avoid unnecessary search. Rule effectiveness for future data is also derived from these membership values.

(4)

concept are reviewed in Sections 2 and 3. Management of hierarchical attribute values by rough sets is described in Section 4. The notation and definitions used in this paper are given in Section 5. A new learning algorithm which can process hierarchical and quantitative attributes by fuzzy rough sets is proposed in Section 6. An example to illustrate the proposed algorithm is given in Section 7. Some discussion is taken in Section 8. Conclusions and future works are finally given in Section 9.

2. Review of related works.

2.1 The rough-set theory

The rough-set theory, proposed by Pawlak in 1982 [15][16], can serve as a new mathematical tool for dealing with data classification problems. It adopts the concept of equivalence classes to partition training instances according to some criteria. Two kinds of partitions are formed in the mining process: lower approximations and upper approximations, from which certain and possible rules can easily be derived.

Formally, let U be a set of training examples (objects), A be a set of attributes describing the examples, C be a set of classes, and Vj be a value domain of an attribute Aj. Also let v(i_j)

be the value of attribute Aj for the i-th object Obj(i). When two objects Obj(i) and Obj(k) have

the same value of attribute Aj, (that is, v(ij)= ) (k

j

v ), Obj(i) and Obj(k) are said to have an indiscernibility relation (or an equivalence relation) on attribute Aj. Also, if Obj(i) and Obj(k)

have the same values for each attribute in subset B of A, Obj(i) and Obj(k) are also said to have an indiscernibility (equivalence) relation on attribute set B. These equivalence relations thus partition the object set U into disjoint subsets, denoted by U/B, and the partition including

Obj(i) is denoted B(Obj(i)). The sets of equivalence classes for subset B are referred to as

(5)

The rough-set approach analyzes data according to two basic concepts, namely the lower and the upper approximations of a set. Let X be an arbitrary subset of the universe U, and B be an arbitrary subset of attribute set A. The lower and the upper approximations for B on X, denoted B*(X) and B*(X) respectively, are defined as follows:

B*(X) = {x | x  U, B(x)  X}, and

B*(X) = {x | x  U and B(x)  X  }.

Elements in B*(x) can be classified as members of set X with full certainty using

attribute set B, so B*(x) is called the lower approximation of X. Similarly, elements in B*(x)

can be classified as members of the set X with only partial certainty using attribute set B, so

B*(x) is called the upper approximation of X. After the lower and the boundary approximations have been found, the rough-set theory can then be used to derive both certain and uncertain information and induce certain and possible rules from them.

2.2 The fuzzy-set concepts

The fuzzy-set theory was first proposed by Zadeh in 1965 [20]. It is primarily concerned with quantifying and reasoning using natural language in which words can have ambiguous meanings [20][24][25]. This can be thought of as an extension of traditional crisp sets in which each element must either be in or not in a set.

Formally, the process by which individuals from a universal set X are determined to be either members or non-members of a crisp set can be defined by a characteristic or

discrimination function [20]. For a given crisp set A, this function assigns a value A( ) to x every x  X such that:

_





   . A x if only and if 0 A x if only and if 1 ) (x A 

(6)

This kind of function can be generalized such that the values assigned to the elements of the universal set fall within specified ranges, referred to as the membership grades of these elements in the set, with larger values denoting higher degrees of set membership. Such a function is called the membership function, A( )x _{, by which a fuzzy set A is usually}

defined. This function is represented by:

A:X[ , ]0 1 ,

where [0, 1] denotes the interval of real numbers from 0 to 1, inclusive. The function can also be generalized to any real interval and is not restricted to [0,1].

A special notation is often used in the literature to represent fuzzy sets. Assume that x1

to xn are the elements in fuzzy set A, and 1 to n are, respectively, their grades of

membership in A. A is then usually represented as follows:

A  1/x1  2/x2  ...  n /xn.

An -cut of a fuzzy set A is a crisp set A that contains all elements in the universal set

X with membership grades in A greater than or equal to a specified value of . This

definition can be written as:

A = {x  X︱_A(x)  }.

The scalar cardinality of a fuzzy set A defined on a finite universal set X is the summation of the membership grades of all the elements of X in A. Thus,

 

X

x A

x

A



( ).

Among operations on fuzzy sets are the basic and commonly used complementation,

union and intersection, as proposed by Zadeh. They are defined as follows.

(1) The complementation of a fuzzy set A is denoted by



A, and the membership

(7)

__A(x)1_A(x),  x X .

(2) The intersection of two fuzzy sets A and B is denoted by A  B, and the membership function of A  B is given by:

_A__B(x)m i n



_A(x),_B(x)



,  x X .

(3) The union of fuzzy sets A and B is denoted by A  B, and the membership function of A  B is given by:

_A__B(x)m a x



_A(x),_B(x)



,  x X.

The above fuzzy operations are used in the proposed learning algorithm to find linguistic certain and possible rules.

2.3. Hierarchical Attributes

Most of the previous studies on rough sets focused on finding certain rules and possible rules on the single concept level. However, hierarchical attributes are usually predefined in real-world applications and can be represented by hierarchy trees. Terminal nodes on the trees represent actual attribute values appearing in training examples; internal nodes represent value clusters formed from their lower-level nodes. Deriving rules on multiple concept levels may lead to the discovery of more general and important knowledge from data. A simple example for attribute Transport is given in Figure 1.

Transport Train Car Express Train Ordinary Train Expensive Car Cheap Car

(8)

Figure 1: An example of predefined hierarchical values for attribute Transport

In Figure 1, the attribute Transport falls into two general categories: Train and Car.

Train can be further classified into two more specific categories Express Train and Ordinary Train. Similarly, assume Car is divided into Expensive Car and Cheap Car. Only the

terminal attribute nodes (Express Train, Ordinary Train, Expensive Car, Cheap Car) can appear in training examples.

The concept of equivalence classes in the rough set theory makes it very suitable for finding cross-level certain and possible rules from training examples with hierarchical values. The equivalence class of a non-terminal-level attribute value for attribute Aj can be easily

found by the union of its underlying terminal-level equivalence classes for Aj. Also, the

equivalence class of a cross-level attribute value combination for more than two attributes can be derived from the intersection of the equivalence classes of its single attribute values. In this paper, we will propose a fuzzy-rough learning algorithm for deriving cross-level certain and possible rules from training examples with hierarchical quantitative attribute values.

4. Fuzzy-Rough Sets

In the past, we proposed a method which combined rough-set theory and fuzzy-set theory to deal with the classification problem [6]. It is extended here to find a set of cross-level maximally general fuzzy certain and possible rules from examples with hierarchical and quantitative attributes. Some definitions about fuzzy approximations are introduced below.

(9)

Obj(r) with membership values f_jk(i) and f_jk(r) equal to or larger than a certain α value,

Obj(i) and Obj(r) are said to have a fuzzyα -indiscernibility relation (or fuzzyα -equivalence relation) on attribute Aj with membership value min( f_jk(i)  f_jk(r)). Also, if the same linguistic

terms of an attribute subset B exist in both Obj(i) and Obj(r) with membership values equal to or larger thanα , Obj(i) and Obj(r) are said to have a fuzzyα -indiscernibility relation (or a fuzzyα -equivalence relation) on attribute subset B with a membership value equal to the minimum of all the membership values. These fuzzyα -equivalence relations thus partition the fuzzy object set U into several fuzzy subsets that may overlap, and the result is denoted by U/B. The set of partitions, based on B and including Obj(i), is denoted B(Obj(i)). Thus,

B(Obj(i))={(B1(Obj(i)), _B₁(Obj(i))), …, (Br(Obj(i)), _B_r(Obj(i)))}, where r is the number

of partitions included in B(Obj(i)), Bj(Obj(i)) is the j-th partition in B(Obj(i)), and Bj(Obj(i))

is the membership value of the j-th partition.

Example 1: Consider the following three fuzzy objects shown in Table 1. Obj(1) has a

normal systolic pressure with a membership value of 0.1 and a high systolic pressure with a membership value of 0.75. Obj(1) has also a normal diastolic pressure with a membership

value of 0.4 and a high diastolic pressure with a membership value of 0.8. Furthermore,

Obj(1) is classified as having a high blood pressure. Obj(2) and Obj(3) can be explained in a

similar way.

Table 1: The three fuzzy objects used in Example 1.

Object Systolic Pressure (SP) Diastolic Pressure (DP) Blood Pressure (BP) Obj(1) (0.1/N + 0.75/H) (0.4/N + 0.8/H) H Obj(2) (1/H) (0.16/N + 0.6/H) H Obj(3) (0.5/L + 0.3/N) (0.4/N + 0.3/L) L

(10)

Let the α value be set at 0.1. Since the same linguistic term (N) for attribute SP exists in both Obj(1) and Obj(3) and both the membership values are equal to or larger than 0.1, they have a fuzzy indiscernibility relation on the fuzzy term SP.N and thus form a fuzzy equivalence class with a membership value of min(0.1, 0.3). The other fuzzy indiscernibility relations can be similarly derived. U/{SP} has thus been formed and may be represented as follows:

U/{SP}={({Obj(1), Obj(3)}, 0.1) ({Obj(1), Obj(2)}, 0.75) ({Obj(3)}, 0.5)}. Similarly,

U/{DP}={({Obj(1), Obj(2), Obj(3)}, 0.16) ({Obj(1), Obj(2)}, 0.6) ({Obj(3)}, 0.3)}.

Also, SP(Obj(1))={({Obj(1), Obj(3)}, 0.1) ({Obj(1), Obj(2)}, 0.75)}. It can be easily seen that Obj(1) exists in more than one fuzzy equivalence class. The set of fuzzy equivalence classes for a subset set B is referred to as a fuzzy B-elementary set.

Fuzzyα -lower and fuzzyα -boundary approximations are defined below. Let X be an arbitrary subset of the universe U, and B be an arbitrary subset of the attribute set A. The fuzzy lower and the fuzzy boundary approximations under the threshold value α for B on

X, denoted B*(X) and B*(X), respectively, are defined as follows:

B*(X) = {(Bk(x),B_k(x))| x  U, Bk(x)  X, 1 k|B(x)|}, and

B*(X) = {(Bk(x),B_k(x))| x  U, B(x)X   and B(x)X, 1 k|B(x)|}.

Elements in B*(x) can be classified as members of set X with full certainty using

attribute set B. Also, their membership values may be considered effectiveness measures of fuzzy lower approximations for future data. A low membership value with a fuzzy lower approximation means the lower approximation will have a low tolerance (or effectiveness) on future data. In this case, the fuzzy lower-approximation partitions have a high probability

(11)

of being removed when future data are considered. All of the partitions are, however, valid for the current data set and can be used to correctly classify its elements.

On the other hand, elements in B*(x) can be classified as members of set X with only partial certainty using attribute set B, and their certainty degrees can be calculated from the membership values of elements in the boundary approximations.

Example 2: Continuing from Example 1, assume X={Obj(1), Obj(2)}. The fuzzy lower approximation and the fuzzy boundary approximation for attribute SP according to X can be calculated as follows:

SP*(X)= ({Obj(1), Obj(2)}, 0.75), and

SP*(X)= {({Obj(1), Obj(3)}, 0.1) }.

After the fuzzy lower and the fuzzy boundary approximations have been found, certain and uncertain information can be analyzed, and rules can then be derived.

5. The Proposed Algorithm

In the section, a learning algorithm based on fuzzy rough sets is proposed to find fuzzy cross-level certain and possible rules from training data with hierarchical and quantitative attribute values. According to the definitions of the fuzzy lower approximation and the fuzzy upper approximation, it is easily seen that the fuzzy upper approximation includes the fuzzy lower approximation. Thus each fuzzy certain rule derived from the fuzzy lower approximation will also be derived from the fuzzy upper approximation. It thus causes redundant derivation and wastes computational time. The proposed algorithm thus uses the fuzzy boundary approximation, instead of the fuzzy upper approximation, to derive the pure fuzzy possible rules. It can thus reduce the subsumption checking needed. For convenience, the symbol B*(X) is used from here on to represent the fuzzy boundary approximation of

(12)

attribute subset B on X , instead of the fuzzy upper approximation.

The proposed algorithm first transforms each quantitative value into a fuzzy set of linguistic terms using membership functions and finds the terminal-level fuzzy elementary sets of single attributes. These fuzzy equivalence classes can then be used later to find the non-terminal-level fuzzy elementary sets of single attributes and the cross-level fuzzy elementary sets of attribute combinations. Fuzzy lower approximations are used to derive fuzzy certain rules. Fuzzy boundary approximations, as mentioned above, are used to find fuzzy possible rules. The algorithm calculates the fuzzy lower and the fuzzy boundary approximations of single attributes from the terminal level to the higher level. After that, the fuzzy lower and the fuzzy boundary approximations of more than one attribute are derived based on the results of single attributes. Some pruning heuristics are also used to avoid unnecessary search. The rule-derivation process based on these approximations is then performed to find maximally generally fuzzy certain rules and fuzzy possible rules. The details of the proposed learning algorithm are described as follows.

A learning algorithm processing hierarchical and quantitative attributes by fuzzy rough sets:

Input: A quantitative data set with n objects, m hierarchical attributes, a set of membership functions, and a valueα forα -cut.

Output: A set of cross-level certain and possible rules.

Step 1: Partition the object set into disjoint subsets according to class labels. Denote each set of objects belonging to the same class Cl as Xl.

Step 2: Transform the quantitative value v(i_j) of each object Obj(i), i = 1 to n, for each appearing terminal-level node Aj in Obj(i) into a fuzzy set fj(i), represented as:

(13)

               l l j i j j i j j i j R f R f R f .... 2 2 1 1 ) ( ,

using the given membership functions, where Rjk is the k-th fuzzy region of a

terminal-level attribute node Aj, f_jk( )i is v(i_j)’s fuzzy membership value in region

Rjk, and l (= Aj ) is the number of fuzzy regions for Aj.

Step 3: Remove the linguistic term Rjk from a fuzzy set fj(i)with itsfjk i ( )

<α .

Step 4: Find the terminal-level fuzzy elementary sets of single attributes using fuzzy operations based on the definitions in Section 4.

Step 5: Set l = 1, where l is used to count the number of classes currently being processed. Step 6: Compute the fuzzy lower approximation of each single terminal-level attribute node

t j

A for class Xl as:

A*(X ) {(A (x), (x))|x U, A (x) X , 1 k |A(x)| } , t j l t jk A t jk l t j t jk     



where At_jk(x) is the terminal-level fuzzy equivalence class including object x and

derived from the k-th fuzzy region of attribute node At_j, and |Atj(x)| is the

number of fuzzy regions forAt_j.

Step 7: Compute the fuzzy boundary approximation of each single terminal-level attribute node At_j for class Xl as:

A*(X ) {(A (x), (x))|x U, A (x) X ,A (x) Xl,1 k |Atj(x)| } . t jk l t jk A t jk l t j t jk       





Step 8: Compute the fuzzy lower and fuzzy boundary approximations of each single non-terminal-level attribute node Ant_j for class Xl from the terminal level to the

(14)

(a) Derive the fuzzy equivalence class ntki

j

A of the i-th fuzzy region for the

non-terminal-level attribute node Ant_j on level k by the union of the equivalence classes of the same fuzzy regions in its underlying nodes.

(b) Put the equivalence class of ntki

j

A into the k-level lower approximation for

non-terminal-level attribute node Ant_j if all its underlying equivalence classes of the same fuzzy regions are in the (k+1)-level lower approximation for attribute node Ant_j .

(c) Put the equivalence class of ntki

j

A in the k-level boundary approximation for

attribute node Ant_j if at least one of its underlying equivalence classes of the same fuzzy regions is in the (k+1)-level lower or boundary approximation for attribute node Ant_j .

Step 9: Set q = 2, where q is used to count the number of attributes currently being processed.

Step 10: Compute the fuzzy lower and the fuzzy boundary approximations of each attribute set Bj with q attributes (on any levels) for class Xl from the terminal level to the root

level by the following substeps. Only a fuzzy region from an attribute can be put in the combination.

(a) Skip the combinations which have the equivalence class of at least one of its subsets already in the lower approximation for Xl .

(b) Derive the equivalence class of each remaining combination by the intersection of the equivalence classes of its corresponding single attribute regions. Set the membership values for the derived equivalence class as the minimum of their membership values.

(15)

(c) Put the equivalence class Bj(x) of each combination in substep (b) into the

lower approximation class Xl ifBj(x)Xl.

(d) Put the equivalence class Bj(x) of each combination in substep (b) into the

boundary approximation class if B_j(x) Xl   and Bj(x)Xl.

Step 11: Set q = q + 1 and repeat Step 10 until q>m.

Step 12: Derive the certain rules from the fuzzy lower approximation B*(Xl) of any subset B,

and set the membership values of elements in the lower approximation as effectiveness measures for future data.

Step 13: Remove certain rules more specific than others and keep the more general ones. Step 14: Derive the possible rules from the fuzzy upper approximation B*(Xl) of any subset B,

set the membership values of elements in the upper approximation as effectiveness measures for future data, and calculate the plausibility measure of each rule for Bk(x)

as: p(Bk(x)) =      ) ( ) ) ( ( ) ( ) ( x B x B X x B x B k k l k k x x   .

Step 15: Remove possible rules with condition parts more specific and plausibility values equal to or smaller than those of some other possible or certain rules.

Step 16: Set l = l + 1 and repeat Steps 6 to 15 until l > c.

Step 17: Output the fuzzy certain rules and fuzzy possible rules.

After Step 17, certain and possible rules can then be derived, and can serve as meta-knowledge concerning the given data set.

(16)

6. An Example

In this section, a simple example is given to show how the proposed algorithm can be used to generate linguistic certain and possible rules from quantitative training data with hierarchical values. There are two decision attributes A = {Transport, Residence}, and a class attribute C = {Consumption Style}. Both the attributes have a simple hierarchy as shown in Figures 2 and 3. There are two levels of hierarchical attribute values for attributes Transport and Residence. The roots representing the generic names of attributes are located on level 0 (such as “Transport” and “Residence”), the terminal nodes representing actual values (such as “Imported car”) are on level 1. Only values of terminal nodes can appear in training examples. Assume the class has only two possible values: {High (H), Low (L)}.

Assume the training data set is shown in Table 2.

Table 2: The training data used in this example Figure 3: Hierarchy of Residence

Residence

House Building

Figure 2: Hierarchy of Transport

Transport

Imported Car

Native Car

(17)

Transport Residence Consumption Style Obj(1) Imported car:190 House:1700 High

Obj(2) Imported car:210 House:1800 High

Obj(3) Native car:50 Building:230 Low

Obj(4) Native car:42 Building:220 Low

Obj(5) Native car:45 House:350 Low

Obj(6) Native car:40 House:330 Low

Obj(7) Native car:145 Building:1300 High

Obj(8) Native car:130 Building:1400 High

Obj(9) Imported car:50 Building:250 Low

Obj(10) Imported car:60 Building:270 Low

Table 2 contains ten objects U={Obj(1), Obj(2), …, Obj(10)}. Each training example has a

quantity representing the cost (unit: ten thousands NT dollars) for the attribute value. Assume the membership functions of each terminal attribute node are shown in the Figure 4.

300 500 600 800 2000 L M H House 104_dollars ₂₀₀ ₄₀₀ ₅₀₀ ₇₀₀ ₁₅₀₀ L M H Building 104_dollars House _Building 60 80 100 120 200 L M H Imported car 104_dollars 40 60 80 100 150 L M H Native car 104_dollars

(18)

Figure 4: Membership functions of the attributes

The proposed algorithm then processes the data in Table 2 as follows.

Step 1: Since two classes exist in the data set, two partitions are found as follows:

XH ={Obj(1), Obj(2), Obj(6), Obj(7)}, and

XL = {Obj(3), Obj(4), Obj(5), Obj(8) , Obj(9), Obj(10)}.

Step 2: The quantitative values of each object are transformed into fuzzy sets. Take the attribute Transport in Obj(1) as an example. The value “190” is converted into the fuzzy set (0.1/IC.M + 0.58/IC.H) using the given membership functions, where “IC” is an abbreviation of “Imported Car”. Similarly, NC, HO and BU represent Native Car, House and Building, respectively. The converted results for all the objects are shown in Table 3.

Table 3: The fuzzy sets transformed from the data in Table 2

Object Transport Residence Consumption

Style Obj(1) 0.1/IC.M+0.58/IC.H 0.21/HO.M+0.75/HO.H High

Obj(2) 1/IC.H 0.14/HO.M+0.83/HO.H High

Obj(3) 0.5/NC.L+0.1/ NC.M 0.85/BU.L+0.1/BU.M Low

Obj(4) 0.9/NC.L+0.05/NC.M 0.9/BU.L+0.06/BU.M Low

Obj(5) 0.75/NC.L+0.1/NC.M 0.75/HO.L+0.2/HO.M Low

Obj(6) 1/NC.L 0.9/HO.L+0.1/HO.M High

Obj(7) 0.07/NC.M+0.9/NC.H 0.2/BU.M+0.75/BU.H High

(19)

Obj(9) 1/IC.L 0.75/BU.L+0.16/BU.M Low

Obj(10) 1/IC.L 0.65/BU.L+0.23/BU.M Low

Step 3: Assume the threshold value for α - cut is set at 0.4. The linguistic terms in each object with their membership values less than 0.4 are then discarded. The revised fuzzy data set is then shown in Table 4.

Table 4: The fuzzy sets with their membership values equal to or larger than 0.4

Object Transport Residence Consumption

Style

Obj(1) 0.58/IC.H 0.75/HO.H High

Obj(2) 1/IC.H 0.83/HO.H High

Obj(3) 0.5/NC.L 0.85/BU.L Low

Obj(4) 0.9/NC.L 0.9/BU.L Low

Obj(5) 0.75/NC.L 0.75/HO.L Low

Obj(6) 1/NC.L 0.9/HO.L High

Obj(7) 0.9/NC.H 0.75/BU.H High

Obj(8) 0.42/NC.M+0.6/NC.H 0.85/BU.L Low

Obj(9) 1/IC.L 0.75/BU.L Low

Obj(10) 1/IC.L 0.65/BU.L Low

Step 4: The terminal-level fuzzy elementary sets of single attributes are found by fuzzy operations based on the definitions in Section 4. The results are shown as follows, with Transporttrepresenting the terminal-level values of the attribute Transport.

U/{Transportt} = {({Obj(1), Obj(2)},0.58) ({Obj(9), Obj(10)},1) ({Obj(3),Obj(4), Obj(5), Obj(6)},0.5) ({Obj(7), Obj(8)}, 0.6) ({Obj(8)}, 0.42)};

(20)

Obj(4), Obj(9), Obj(10)}, 0.65) ({Obj(7)}, 0.75)}.

Step 5: l is set at 1. In this example, assume the class XH is first processed.

Step 6: The fuzzy lower approximation of each single terminal-level attribute node for class

XH is calculated. Take Transporttas an example to illustrate the step. Since only the

terminal-level equivalence class ({Obj(1), Obj(2)}, 0.58) for Transportt are completely include d in XH, which is {Obj(1), Obj(2), Obj(7), Obj(8)}, the fuzzy lower

approximation of attribute Transportt for class XH is thus:

Transportt*(XH) = {({Obj(1), Obj(2)}, 0.58)}.

Similarly, the fuzzy lower approximation of Residencet for class XH is calculated

as:

Residencet*(XH) = {({Obj(1), Obj(2)}, 0.75) ({Obj(7)}, 0.75)}.

Step 7: The fuzzy boundary approximation of each single terminal-level attribute node for class XH is calculated. Assume Transportt is first processed. Since its two

equivalence classes ({Obj(3), Obj(4), Obj(5), Obj(6)}, 0.5) and ({Obj(7), Obj(8)}, 0.6) have the intersection with XH and aren’t covered by XH, the fuzzy boundary

approximation of Transportt for class XH is calculated as:

Transportt*(XH) = {({Obj(3), Obj(4), Obj(5), Obj(6)}, 0.5)({Obj(7), Obj(8)}, 0.6)}.

Similarly, the fuzzy boundary approximation of t

Residence for class XH is:

Residencet*(XH) = {({Obj(5), Obj(6), Obj(8)}, 0.75)}.

Step 8: The fuzzy lower and fuzzy boundary approximations of each single non-terminal-level attribute node are calculated from the terminal level to the root

(21)

level by the following substeps.

(a) The fuzzy equivalence class for a non-terminal-level attribute node is derived by the union of the equivalence classes of the same fuzzy regions in its underlying nodes. For example, the equivalence class ({Obj(1), Obj(2)}, 0.58) is for the fuzzy region “Imported Car is High,” and ({Obj(7), Obj(8)}, 0.6) is for “Native

Car is High.” A new equivalence class for “Transport is High” can then be derived

by uniting the two equivalence classes as ({Obj(1), Obj(2), Obj(7), Obj(8)}, 0.58). Here the membership value for the new equivalence class is the minimum of the membership values of the two underlying equivalence classes. Similarly, the equivalence class “Transport is low” is computed as ({Obj(3), Obj(4), Obj(5), Obj(6), Obj(9), Obj(10)}, 0.5).

The equivalence classes for the other non-terminal-level attribute node

Residencent are calculated as ({Obj(1), Obj(2), Obj(7)}, 0.75) and ({Obj(3), Obj(4),Obj(5),

Obj(6) ,Obj(8), Obj(9), Obj(10)}, 0.65).

(b) The fuzzy lower approximations of non-terminal-level attribute nodes are calculated. Take the non-terminal-level attribute node Residencent as an example. One of its equivalence classes, ({Obj(1), Obj(2), Obj(7)}, 0.75), is directly put into its fuzzy lower approximation since both its underlying equivalence classes ({Obj(1),

Obj(2)}, 0.75) and ({Obj(7)}, 0.75)} are lower approximation. Thus:

Residencent*(XH) = {({Obj(1), Obj(2), Obj(7)}, 0.75)}.

For Transportnt since not all its underlying equivalence classes are in its low-level fuzzy lower approximation, its fuzzy lower approximation is thus empty. Therefore,

(22)

(c) The fuzzy boundary approximations of the non-terminal-level attribute nodes are calculated. Take the attribute node Transportnt for the class XH as an

example. One of its equivalence classes, ({Obj(1), Obj(2), Obj(7), Obj(8)}, 0.58), will be put into its non-terminal-level fuzzy boundary approximation for XH since at

least one of its underlying equivalence classes, {Obj(7), Obj(8)}, 0.6), is in its lower-level fuzzy boundary approximation. Similarly, its other equivalence class ({Obj(3), Obj(4), Obj(5), Obj(6), Obj(9), Obj(10)}, 0.5)} is also put into the non-terminal-level fuzzy boundary approximation. Transportnt*(X_H) is thus found as:

Transportnt*(XH)= {({Obj(1), Obj(2), Obj(7), Obj(8)}, 0.58) ({Obj(3), Obj(4),

Obj(5), Obj(6), Obj(9), Obj(10)}, 0.5)}.

Similarly, the non-terminal-level fuzzy boundary approximation for

Residencent* is found as:

Residencent*(XH) = {({Obj(3), Obj(4), Obj(5), Obj(6), Obj(8), Obj(9), Obj(10)}, 0.65)}.

Step 9: q is set at 2, where q is used to count the number of attributes currently being processed.

Step 10: The fuzzy lower and the fuzzy boundary approximations of each attribute set with 2 attributes for class XH are calculated from the terminal level to the root level. Only a

fuzzy region from an attribute can be put in a combination. The combinations from the terminal levels are first processed by the following substeps.

(a) The combinations which have the equivalence class of at least one of its subsets already in the lower approximation for XH is skipped. In this example, the

equivalence classes for the three single attribute values (Transport =

(23)

are in the lower approximation for XH. The combinations including these three

attribute values won’t then be considered in the later steps. Thus, only the following six combinations for Transportt and Residencet are considered.

(Transport=Imported_Car. Low, Residence=House. Low), (Transport=Imported_Car. Low, Residence=Building. Low), (Transport=Native_Car. High, Residence=House. Low), (Transport=Native_Car. High, Residence=Building. Low), (Transport=Native_Car. Low, Residence=House. Low), (Transport= Native_Car. Low, Residence=Building. Low).

(b) The equivalence class of each combination above is found by the intersection of the equivalence classes of its corresponding single attribute regions. Take the combination (Transport = Native_Car.Low and Residence = House.Low) as an example. The equivalence class of (Transport = Native_Car.Low) is {({Obj(3), Obj(4), Obj(5), Obj(6)}, 0.5)} and the one of (Residence = House.Low) is {({Obj(5), Obj(6)}, 0.75}. The equivalence class of (Transport = Native_Car.Low and

Residence = House.Low) is thus their intersection, which is {({Obj(5), Obj(6)}, 0.75)}. Note that the membership value is set as the minimum membership values of the two of the subsets. All the equivalence classes for the above combinations of {Transportt, Residencet} can be similarly derived as follows:

U/{Transportt, Residencet}= {({Obj(3), Obj(4)}, 0.5) ({Obj(9), Obj(10)}, 0.65 ) ({ Obj(5), Obj(6)}, 0.75) ({ Obj(8)}, 0.6)}.

(c) The equivalence class of each combination in substep (b) is put into the lower approximation for class XH if it is covered by XH. Since all the equivalence

(24)

classes in the above combinations aren’t covered by the class XH, the fuzzy lower

approximation for the attribute combination {Transportt, Residencet} is thus empty. (d) The equivalence class of each combination in substep (b) is put into the boundary approximation class for class XH if its intersection with XH is not empty.

Since the equivalence class of {Transportt, Residencet}, ({Obj(3), Obj(4), Obj(5), Obj(6)}, 0.5), contains the object Obj(6) in class XH, the fuzzy boundary

approximation of{Transportt, Residencet}is shown below.

{Transportt, Residencet}*(XH)= {({ Obj(5), Obj(6)}, 0.75).

After the fuzzy lower and the fuzzy boundary approximations for terminal-level attribute combinations are found, the above substeps are repeated for higher-level attribute combinations. They are {Transportt, Residencent}, {Transportnt, Residencet} and {Transportnt, Residencent} in this example. Since the equivalence class for (Residence = High) is in the lower approximation for XH , the

combinations including (Residence = High) won’t then be considered. The equivalence classes for the higher-level attribute combinations are found as follows.

U/{Transportt, Residencent} = {({Obj(3), Obj(4), Obj(5), Obj(6)}, 0.5) ({Obj(9), Obj(10)}, 0.65 ) ({ Obj(8)}, 0.6)};

U/{Transportnt, Residencet} = {({Obj(3), Obj(4), Obj(9), Obj(10}, 0.5) ({ Obj(5), Obj(6)}, 0.75) ({ Obj(8)}, 0.6)};

U/{Transportnt, Residencent} ={({Obj(3), Obj(4), Obj(5), Obj(6) Obj(9), Obj(10)}, 0.5) ({ Obj(8)}, 0.6)}.

The fuzzy lower approximations for the above combinations are empty since all the equivalence classes aren’t covered by the class XH. The fuzzy boundary

approximations are found as:

(25)

{Transportnt, Residencet}*(XH) = {({ Obj(5), Obj(6)}, 0.75)};

{Transportnt, Residencent}*(XH) = {({ Obj(5), Obj(6)}, 0.75)}= {({Obj(3),

Obj(4), Obj(5), Obj(6) Obj(9), Obj(10)}, 0.5)}.

Step 11: q = 2 + 1 = 3. Since q is larger than the number of attributes (= 2), the next step is executed.

Step 12: The linguistic certain rules are derived from the fuzzy lower approximations, and the membership values in the lower approximations are set as the effectiveness measures of the rules for future data. In this example, the following four linguistic certain rules are derived:

1. If Transport is High_Cost Imported Car then Consumption Style is High, with future effectiveness = 0.58;

2. If Residence is High_Cost House then Consumption Style is High, with future effectiveness = 0.75;

3. If Residence is High_Cost Building then Consumption Style is High, with future effectiveness = 0.75;

4. If Residence is High_Cost then Consumption Style is High, with future effectiveness = 0.75.

Step 13: The certain rules more specific than others are removed and the more general ones are reserved. Since the second and third rules are more specific than the fourth rule, the two ones are removed.

Step 14: The linguistic possible rules are derived from the fuzzy boundary approximations, and the membership values in the boundary approximations are set as the effectiveness measures of the rules for future data. The plausibility measure of each rule is also calculated. For example, from Transportt*(XH) = {({Obj(3), Obj(4), Obj(5),

(26)

Obj(6)},0.5) ({Obj(7), Obj(8)}, 0.6)}, the following two possible rules are derived: 1. If Transport is Low_Cost Native Car then Consumption Style is High, with a

plausibility = 1 75 . 0 9 . 0 5 . 0 75 . 0  

 = 0.23, and a future effectiveness = 0.1;

2. If Transport is High_Cost Native Car then Consumption Style is High, with a

plausibility = 6 . 0 9 . 0 9 . 0

 = 0.6, and a future effectiveness = 0.1;

All the other possible rules are derived in the same way. They are shown as follows.

3. If Residence is Low_Cost House then Consumption Style is High, with a plausibility = 0.35, and a future effectiveness = 0.75;

4. If Transport is High_Cost then Consumption Style is High, with a plausibility = 0.51, and a future effectiveness = 0.58;

5. If Transport is Low_Cost then Consumption Style is High, with a plausibility =0.19, and a future effectiveness = 0.5;

6. If Residence is Low_Cost then Consumption Style is High, with a plausibility =0.15, and a future effectiveness = 0.65;

7. If Transport is Low_Cost Native Car and Residence is Low_Cost House then

Consumption Style is High, with a plausibility = 0.53, and a future

effectiveness = 0.75;

8. If Transport is Low_Cost Native Car and Residence is Low_Cost then

9. If Transport is Low_Cost and Residence is Low_Cost House then

(27)

10. If Transport is Low_Cost and Residence is Low_Cost then Consumption Style is High, with a plausibility = 0.19, and a future effectiveness = 0.5.

Step 15: Since the tenth rule is more specific than the fifth one and their plausibility values are equal. The former one is then removed from the set of possible rules.

Step 16: l = l + 1 = 1 + 1 = 2. Steps 6 to 15 are then repeated for the other class XL.

Step 17: All the linguistic certain rules and possible rules are output. They are shown as follows.

Linguistic certain rules:

1. If Transport is High_Cost Imported_Car then Consumption Style is High, with future effectiveness = 0.58;

2. If Residence is High_Cost then Consumption Style is High, with future effectiveness = 0.75;

3. If Residence is Low_Cost Building then Consumption Style is Low, with future effectiveness = 0.65;

4. If Transport is Low_Cost Imported Car then Consumption Style is Low, with future effectiveness = 1;

5. If Transport is High_Cost Native Car and Residence is Low_Cost House then

Consumption Style is Low, with future effectiveness = 0.6;

6. If Transport is Middle_Cost then Consumption Style is Low, with future effectiveness = 0.42.

Linguistic possible rules:

1. If Transport is Low_Cost Native Car then Consumption Style is High, with a plausibility = 0.23, and a future effectiveness = 0.1;

(28)

plausibility = 0.6, and a future effectiveness = 0.1;

3. If Residence is Low_Cost House then Consumption Style is High, with a plausibility = 0.35, and a future effectiveness = 0.75;

4. If Transport is High_Cost then Consumption Style is High, with a plausibility = 0.51, and a future effectiveness = 0.58;

5. If Transport is Low_Cost then Consumption Style is High, with a plausibility =0.19, and a future effectiveness = 0.5;

6. If Residence is Low_Cost then Consumption Style is High, with a plausibility = 0.15, and a future effectiveness = 0.65;

7. If Transport is Low_Cost Native Car and Residence is Low_Cost House then

8. If Transport is Low_Cost Native Car and Residence is Low_Cost then

9. If Transport is Low_Cost and Residence is Low_Cost House then

10. If Residence is Low_Cost House then Consumption Style is Low, with a plausibility = 0.65, and a future effectiveness = 0.75;

11. If Transport is Low_Cost Native Car then Consumption Style is Low, with a plausibility = 0.68, and a future effectiveness = 0.5;

12. If Transport is High_Cost Native Car then Consumption Style is Low, with a plausibility = 0.4, and a future effectiveness = 0.6;

(29)

=0.58, and a future effectiveness = 0.5;

14. If Residence is Low_Cost then Consumption Style is Low, with a plausibility = 0.15, and a future effectiveness = 0.75;

15. If Transport is High_Cost then Consumption Style is Low, with a plausibility = 0.19, and a future effectiveness = 0.58.

After Step 17, all the linguistic certain and possible rules are derived, and can serve as meta-knowledge concerning the given data set.

7. Discussion

In the proposed learning algorithm for handling training examples with hierarchical values, only the maximally general certain rules, instead of all certain ones, are kept for classification. Certain rules which are not maximally general are removed since they provide no other new information. Take the maximally general rule “If Residence is High_Cost then

Consumption Style is High, with future effectiveness = 0.75”, derived in the above section as an example. All the descendent rules covered by the maximally general rule according to the taxonomy relation in Figure 2 are shown as follows:

1. If Residence is High_Cost Building then Consumption Style is High, with future effectiveness = 0.75;

2. If Residence is High_Cost House then Consumption Style is High, with future effectiveness = 0.75.

It can be easily verified that the above two rules are also certain rules.Besides, any rules generated by adding additional constraints into the maximally general rule or into its descendent rules are also certain. These include the following 18 rules:

(30)

1. If Transport is Car and Residence is Villa then Consumption Style is High;

2. If Transport is Car and Residence is Single House then Consumption Style is High; 3. If Transport is Car and Residence is Suite then Consumption Style is High;

4. If Transport is Car and Residence is Apartment then Consumption Style is High; 5. If Transport is Car and Residence is House then Consumption Style is High; 6. If Transport is Car and Residence is Building then Consumption Style is High; 7. If Transport is Expensive Car and Residence is Villa then Consumption Style is

High;

8. If Transport is Expensive Car and Residence is Single House then Consumption

Style is High;

9. If Transport is Expensive Car and Residence is Suite then Consumption Style is

High;

10. If Transport is Expensive Car and Residence is Apartment then Consumption Style is High;

11. If Transport is Expensive Car and Residence is House then Consumption Style is

High;

12. If Transport is Expensive Car and Residence is Building then Consumption Style is

High;

13. If Transport is Cheap Car and Residence is Villa then Consumption Style is High; 14. If Transport is Cheap Car and Residence is Single House then Consumption Style

is High;

15. If Transport is Cheap Car and Residence is Suite then Consumption Style is High; 16. If Transport is Cheap Car and Residence is Apartment then Consumption Style is

(31)

17. If Transport is Cheap Car and Residence is House then Consumption Style is

High;

18. If Transport is Cheap Car and Residence is Building then Consumption Style is

High;

The pruning procedure is embedded in the proposed algorithm. The above subsumption relation for certain rules is, however, not valid for possible rules. The plausibility of a parent possible rule will always lie between the minimum and the maximum plausibility values of its children rules. Take the possible rule “If Transport is Low_Cost then Consumption Style is

High, with a plausibility =0.19” derived in the above section as an example. Both its

descendent rules according to the taxonomy relation in Figure 2 are shown as follows:

1. If Transport is Low_Cost Native Car then Consumption Style is High with a plausibility = 0.23;

2. If Transport is Low_Cost Imported Car then Consumption Style is High with a

plausibility = 0.

It can be seen that the plausibility of the parent rule is between 0.23 and 0. Note that the second child rule will not be actually kept since its plausibility is zero. It is only shown here to demonstrate the relationship of the plausibility values in parent and child rules. The child rules with plausibility values less than their parent rules will also be kept by the proposed algorithm since they may provide some useful information about the classification. When a new event satisfies both a child rule and its parent rule, it is more accurate to derive the plausibility of the consequence from the child rule than from the parent rule. However, if a new event has an unknown attribute value, but a known non-terminal value, it can still be

(32)

inferred using the parent rules. The proposed algorithm thus keeps all the possible rules except for those with plausibility = 0. If the child rules with plausibility values less than their parent rules won’t be kept, the proposed algorithm can be easily modified by simply adding a subsumption checking after the step of generating the possible rules.

Besides, a plausibility threshold can be used in the proposed algorithm to avoid overwhelming possible rules. The rules with their plausibility values less than the threshold will thus be pruned. This checking step can easily be embedded in finding the boundary approximation to reduce the computational time further.

8. Conclusions and Future Works

In this paper, we have proposed a new learning algorithm based on fuzzy rough sets to find fuzzy cross-level certain and possible rules from training data with hierarchical and quantitative attribute values. The proposed method adopts the concept of fuzzy equivalence classes to find the terminal-level elementary sets of single attributes. These fuzzy equivalence classes are then easily used to find the non-terminal-level elementary sets of single attributes and the cross-level elementary sets of multiple attributes by the union and the intersection operations. Fuzzy lower and fuzzy boundary approximations are then derived from the elementary sets from the terminal level to the root level. Fuzzy boundary approximations, instead of fuzzy upper approximations, are used in the proposed algorithm to find fuzzy possible rules, thus reducing some subsumption checking. Fuzzy lower approximations are used to derive maximally general fuzzy certain rules. Some pruning heuristics are also used to avoid unnecessary search. The fuzzy rules derived can be used to infer results from a new event with both terminal and non-terminal attribute nodes. In the future, we will try to handle other kinds of learning or mining problems.

(33)

References

[1] B. G. Buchanan and E. H. Shortliffe, Rule-Based Expert System: The MYCIN

Experiments of the Standford Heuristic Programming Projects (Addison-Wesley, MA.,

1984).

[2] C. C. Chan, “A rough set approach to attribute generalization in data mining,” Journal

of Information Sciences, 107, 1998, pp. 169-176.

[3] D. Dubois and H. Prade, “Putting rough sets and fuzzy sets together,” Intelligent

Decision Support, Handbook of Applications and Advances of the Rough Sets Theory,

1992, pp.203-232.

[4] L. T. Germano, P. Alexandre, “Knowledge-base reduction based on rough set techniques,” Canadian Conference on Electrical and Computer Engineering, 1996, pp. 278-281.

[5] J. W. Grzymala-Busse, “Knowledge acquisition under uncertainty: A rough set approach,” Journal of Intelligent Robotic Systems, 1, 1988, pp. 3-16.

[6] T. P. Hong, T. T. Wang and S. L. Wang, “Knowledge acquisition from quantitative data using the rough-set theory,” Intelligent Data Analysis, Vol. 4, 2000, pp. 289-304.

[7] Y. Kodratoff and R. S. Michalski, Machine Learning: An Artificial Intelligence

Artificial Intelligence Approach 3 (Morgan Kaufmann Publishers, San Mateo, CA.,

1983).

[8] G. Lambert-Torres, V. H. Quintana, A. P. Ahres and L. E. Borges, “Knowledge-base reduction based on rough set techniques,” The Canadian Conference on Electrical and

Computer Engineering, 1996, pp. 278-281.

(34)

structure with rough sets,” The 9th IFSA World Congress and the 20th NAFIPS

International Conference, Vol. 1, 2001, pp. 447 -452.

[10] P. J. Lingras, Y. Y. Yao, “Data mining using extensions of the rough set model,” Journal

of the American Society for Information Science, 1998, Vol. 49(5), pp. 415-422.

[11] R. S. Michalski, J. G. Carbonell and T. M. Mitchell, Machine Learning: An Artificial

Intelligence Approach 1 (Morgan Kaufmann Publishers, Los Altos, CA., 1983).

[12] R. S. Michalski, J. G. Carbonell and T. M. Mitchell, Machine Learning: An Artificial

Intelligence Approach 2 (Morgan Kaufmann Publishers, Los Altos, CA., 1983).

[13] A. Nakamura, “Applications of fuzzy-rough classifications to logics,” Intelligent

Decision Support, Handbook of Applications and Advances of the Rough Sets Theory,

1992, pp.233-250.

[14] E. Orlowska, “Reasoning with incomplete information: rough set based information logics,” in: V. Alagar, S. Bergler and F. Q. Dong (eds.), Incompleteness and Uncertainty in Information Systems, Springer, pp. 16-33.

[15] Z. Pawlak, “Rough set,” International Journal of Computer and Information Sciences, pp. 341-356, 1982.

[16] Z. Pawlak, “Why rough sets?,” Proceedings of the Fifth IEEE International Conference

on Fuzzy Systems, 1996, Vol. 2, pp. 738 –743.

[17] G. Riley, Expert Systems - Principles and Programming (Pws-Kent, Boston, 1989), pp. 1-56.

[18] S. Tsumoto, “Knowledge discovery in medical databases based on rough sets and attribute-oriented generalization,” The 1998 IEEE International Conference on Fuzzy

Systems, Vol. 2, 1998, pp. 1296 -1301.

[19] Y. Y. Yao, “Stratified rough sets and granular computing,” The 18th International

(35)

[20] L. A. Zadeh, "Fuzzy logic," IEEE Computer (1988) 83-93.

[21] J. Zhang, J. Wang, D. Li, H. He and J. Sun, “A new heuristic reduct algorithm based on rough sets theory,” Lecture Notes in Computer Science, Vol. 2762, Springer, Berlin, 2003, pp. 247-253.

[22] M. Zhang and J. T. Yao, “A rough sets based approach to feature selection,” The IEEE

Annual Meeting of Fuzzy Information, 2004, pp. 434-439.

[23] N. Zhong, J. Z. Dong, S. Ohsuga, T. Y. Lin, “An incremental, probabilistic rough set approach to rule discovery,” IEEE International Conference on Fuzzy Systems, 1998, Vol. 2, pp. 933 –938.

[24] H. J. Zimmermann, “Fuzzy sets”, Decision Making and Expert Systems, Kluwer Academic Publishers, Boston, 1987.

[25] H. J. Zimmermann, Fuzzy set theory and its applications, Kluwer Academic Publisher, Boston, 1991.