Generating learning sequences for decision makers through data mining and competence set expansion

(1)

Generating Learning Sequences for Decision Makers Through Data Mining and Competence Set Expansion

Yi-Chung Hu, Ruey-Shun Chen, and Gwo-Hshiung Tzeng

Abstract—For each decision problem, there is a competence set, proposed by Yu, consisting of ideas, knowledge, information, and skills required for solving the problem. Thus, it is reasonable that we view a set of useful pat-terns discovered from a relational database by data mining techniques as a needed competence set for solving one problem. Significantly, when decision makers have not acquired the competence set, they may lack confidence in making decisions. In order to effectively acquire a needed competence set to cope with the corresponding problem, it is necessary to find appropriate learning sequences of acquiring those useful patterns, the so-called compe-tence set expansion. This paper thus proposes an effective method consisting of two phases to generate learning sequences. The first phase finds a com-petence set consisting of useful patterns by using a proposed data mining technique. The other phase expands that competence set with minimum learning cost by the minimum spanning table method proposed by Feng and Yu. From a numerical example, we can see that it is possible to help decision makers to solve the decision problems by use of the data mining technique and the competence set expansion, enabling them to make better decisions. Index Terms—Competence set, data mining, decision making, fuzzy sets.

NOMENCLATURE

K Number of linguistic values of a linguistic variable.

k Dimension of a fuzzy grid.

d Number of attributes of a database relation, whered 1.

n Total number of tuples of a relational database.

Ax

K;i im-th linguistic value ofK linguistic values which are

de-fined for the linguistic variablexm, where1 im K. x

K;i Membership function ofAxK;i .

tp p-th tuple of a specified relation, where tp= (tp ; tp ; . . . ; tp ), where 1 p n.

I. INTRODUCTION

Data mining is the exploration and analysis of large quantities of data in order to discover meaningful patterns [1]. It extracts implicit, previously unknown, and potentially useful patterns from data [22]. Significantly, it is a methodology for the extraction of knowledge from data, knowledge relating to a problem that we wish to solve [2]. Some patterns, such as “purchase amount of product A is large” or “age of customers is close to 30” may be discovered from a relational database set up in one supermarket by data mining techniques, and these patterns could be useful for decision makers.

Decision makers can try to “learn” those useful patterns, that is, they can investigate the corresponding factors or current strategies that can result in those mining results. They can thus acquire or learn the corresponding knowledge from those useful patterns. Finally, decision makers can be confident of solving some decision problems, e.g., proposing a more competitive marketing strategy. To effectively learn the corresponding patterns, it is necessary to generate appropriate learning sequences of those patterns. For example, since learning directly from one pattern to another pattern requires learning cost, which can be measured by time or money, it may be more effective

Manuscript received September 3, 2001; revised January 8, 2002. This paper was recommended by Associate Editor W. Pedrycz.

Y.-C. Hu and R.-S. Chen are with the Institute of Information Management, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C.

G.-H. Tzeng is with the Institute of Management of Technology, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C.

Publisher Item Identifier S 1083-4419(02)05723-0.

for decision makers to learn “purchase amount of product B is small” before learning “purchase amount of product A is large.” A similar example would be that, to obtain knowledge of mathematics, it would be appropriate to learn calculus before learning statistics.

Competence sets were initiated by Yu [24]. Its mathematical founda-tion was provided by Yu and Zhang [3]. For each decision problem (e.g., promoting products or improving services for a business), there exists a competence set consisting of ideas, knowledge, information, and skills for solving the problem [24]. From this viewpoint, it is reasonable that we view a set of useful patterns discovered from a relational database by data mining techniques as a needed competence set for solving one problem. When decision makers have already acquired the needed com-petence set and are proficient at it, they will be comfortable and confi-dent in making decisions [4], [23]. Otherwise, they must try to acquire the needed competence set to solve the problem. In order to acquire a needed competence set to cope with the facing decision problem, finding appropriate learning sequences for acquiring those useful pat-terns, so-called competence set expansion, are very necessary.

This paper thus proposes an effective method consisting of two phases to generate learning sequences. The first phase finds a needed competence set consisting of useful patterns by an proposed algorithm for finding frequent and necessary fuzzy grids, which is a significant part of the fuzzy grids based rules mining algorithm (FGBRMA) proposed by Hu et al. [5]. The other phase expands that needed compe-tence set with minimum learning cost by the minimum spanning table method proposed by Feng and Yu [6]. From a numerical example, we can see that it is possible to help decision makers to confidently solve the decision problems they face through the data mining technique and the competence set expansion.

In the following sections, since we incorporate the concepts of fuzzy partition into the proposed data mining technique, we thus introduce the cases for partitioning quantitative and categorical attributes by var-ious linguistic values in Section II. In Section III, we introduce the de-termination of useful patterns, the data structure for implementing the proposed algorithm, and a heuristic method for determining necessary patterns. Then, we briefly describe the proposed algorithm. The con-cepts of the competence set expansion are demonstrated in Section IV, and we also briefly introduce the minimum spanning table method in this section. A detailed simulation of a numerical example is described in Section V. Discussions and conclusions are presented in Sections VI and VII, respectively.

II. PARTITIONATTRIBUTES

Fuzzy sets were originally proposed by Zadeh [7], who also pro-posed the concept of linguistic variables and its applications to approx-imate reasoning [8]. Formally, a linguistic variable is characterized by a quintuple [9], [21] denoted by(x; T (x); U; G; M), where

x name of the variable;

T (x) term set of x, that is, the set of names of linguistic values or

terms, which are linguistic words or sentences in a natural language [10], ofx;

U universe of discourse;

G syntactic rule for generating values ofx;

M semantic rule for associating a linguistic value with a meaning.

For example, we can view “Age” as a linguistic variable,T (Age) =

fyoung, close to 30, close to 50, oldg, G is a rule which generates the

linguistic values inT (Age), and U = [0; 60]. M(young) assigns a membership function to young.

A relational database is a collection of tables, each of which is as-signed a unique name and consists of a set of attributes and stores a

(2)

Fig. 1. K = 2 for “Age.”

large set of records or tuples [11]. It seems to be reasonable that we view each attribute as a linguistic variable. It is noted that, if there exist

d attributes in a database, then a d-dimensional feature space is

con-structed and each attribute is viewed as an axis of this space. The cases for partitioning quantitative and categorical attributes by various lin-guistic values (i.e., fuzzy partition) are introduced as follows.

A. Partition Quantitative Attributes

Quantitative attributes of a relational database are numeric and have an implicit ordering among values (e.g., age, salary) [11]. Moreover, a quantitative attribute can be partitioned by linguistic values and this technique has been widely used in pattern recognition and fuzzy infer-ence. For example, there are the applications to pattern classification by Ishibuchi et al. [12], [13], and Hu et al. [14], and to fuzzy rule gen-eration by Wang and Mendel [15]. In addition, some methods for par-titioning feature space were discussed by Sun [16] and Bezdek [17].

As we have mentioned earlier, a quantitative attribute can be parti-tioned byK linguistic values (K = 2; 3; 4 . . .). It should be noted that the value ofK is dependent on the actual requirement or preference of decision makers. For example,K = 2; K = 3; and K = 4 for the linguistic variable “Age” are depicted in Figs. 1, 2, and 3, respectively. Triangular membership functions are used for each linguistic value fined in each quantitative attribute for simplicity. A linguistic value de-noted byAAge_K;i can be described in a linguistic sentence. For example

AAge

K;1: young, and below 60=(K 0 1) (1)

AAge

K;K: ;old, and above [60 0 60=(K 0 1)] (2) AAge

K;i : close to (im0 1) 2 [60 0 60=(K 0 1)]

and between(im0 2) 2 [60 0 60=(K 0 1)]

andim2 [60 0 60=(K 0 1)]; for 1 < im< K: (3)

In addition, the membership function ofAAge_K;i , which is denoted by

Age

K;i , is stated as follows: Age K;i (x) = max 1 0 x 0 aKi bK; 0 (4) where aK i = mi + (ma 0 mi) 1 (im0 1)=(K 0 1) (5) bK_{= (ma 0 mi)=(K 0 1)} ₍₆₎

Fig. 4. Both “Age” and “Salary” are divided into three fuzzy partitions. wherema and mi are the maximal value and the minimal value of the domain of Age (i.e.,U = [0; 60]), respectively. In this example, we can see thatma and mi are equal to 60 and 0, respectively. Ishibuchi et al. [12] referred to such a partition method as the simple fuzzy grid method. Furthermore, a high-dimensional fuzzy grid can be generated. For example, if both attributes, e.g., “Age”(x1) and “Salary” (x2), are

all partitioned by three linguistic values, then a feature space is parti-tioned into 32 3 two-dimensional (2-D) fuzzy grids (i.e., a grid-type fuzzy partition), as shown in Fig. 4. For the shaded 2-D fuzzy grid shown in Fig. 4, whose linguistic value is “young AND high,” we use

Ax

3;1^ Ax3;3to denote it.

B. Partition Categorical Attributes

Categorical attributes of a relational database have a finite number of possible values, with no ordering among values (e.g., sex, color) [11]. If the distinct attribute values aren0 (n0 is finite), then this attribute can only be partitioned byn0linguistic values. For example, since the attribute “Sex” is categorical, the linguistic sentence of each linguistic value can thus be stated as follows:

ASex

2;1: male (7)

ASex

2;2: female: (8)

A linguistic valueAx_n;i is defined in(im0"; im+")(" ! 0), where

the membership function ofAx_{n ;i} is “1” for(im0 "; im+ ").

III. ALGORITHM FORFINDINGFREQUENT ANDNECESSARY FUZZYGRIDS

In the proposed algorithm for finding frequent and necessary fuzzy grids, which is a significant part of the FGBRMA,Ax_K;i is viewed as a candidate one–dimensional (1-D) fuzzy grid. After all candidate 1-D fuzzy grids have been generated, we need to determine how to generate frequent fuzzy grids by those candidate 1-D fuzzy grids. That is, each frequent fuzzy grid denoted by its linguistic value is a useful pattern for decision making. In fact, the terms “candidate” and “fre-quent” originated from the well-known apriori algorithm [18], which

(3)

TABLE I INITIALTABULARFGTTFS

is an influential algorithm for mining association rules [11]. Generally, a frequent item set means that this set is useful for decision makers.

Example: Any set of fruits, e.g.,fApple, Orangeg, sold in one su-permarket is a candidate item set. If the purchase frequency (i.e., divide the number of transactions that buy apples and oranges by the total number of transactions) offApple, Orangeg is larger or equal to a pre-specified threshold named support, then {Apple, Orange} is a frequent item set.

In the following sections, we introduce the determination of frequent fuzzy grids, the data structure for implementing the proposed algo-rithm, and a heuristic method for determining necessary patterns. Sub-sequently, we briefly describe the proposed algorithm.

A. Determine Frequent Fuzzy Grids

Suppose that each quantitative attributexmis partitioned byK lin-guistic values, and let the universe of discourseU = ft1; t2; . . . ; tng.

Then, the linguistic value that is assigned to a candidatek-dimensional

(1 k d) fuzzy grid, e.g., Ax

K;i ÂxK;i ^1 1 1ÂxK;i ÂxK;i ,

can be alternatively represented as

Ax

K;i ^ AxK;i ^ 1 1 1 ^ AxK;i ^ AxK;i

= n

p=1

_A _Â _^111Â _Â (tp)=tp: (9)

The degree to whichtpbelongs to

AxK;i ^ AxK;i ^ 1 1 1 ^ AxK;i ^ AxK;i

[i.e.,

A Â ^111Â Â (tp)] can be computed as

x

K;i (tp ) 1 xK;i (tp ) 1 . . . 1 xK;i (tp ) 1 xK;i (tp )

by the algebraic product. It is noted that, in comparison with the other intersection operators such as the minimum operator or the drastic product, the algebraic product “gently” performs the fuzzy intersec-tion. To check whether or not this fuzzy grid is frequent, we formally define the fuzzy supportFS(Ax_K;i ^ Ax_K;i ^ 1 1 1 ^ Ax_K;i ^ Ax_K;i ) ofAx_K;i ^ Ax_K;i ^ 1 1 1 ^ Ax_K;i ^ Ax_K;i as follows:

FS Ax

K;i ^ AxK;i ^ 1 1 1 ^ AxK;i ^ AxK;i =

n p=1

xK;i (tp ) 1 xK;i (tp ) 1 . . . 1

x_K;i tp 1 xK;i (tp ) n (10)

wheren is the number of training samples. When FS(Ax_K;i ^Ax_K;i ^

1 1 1^AxK;i ^AxK;i ) is larger than or equal to the user-specified

min-imum fuzzy support (min FS), we can say thatAx_K;i ^ Ax_K;i ^ 1 1 1 ^

Ax_K;i ^ Ax

K;i is a frequentk-dimensional fuzzy grid. A frequent

fuzzy grid actually stands for a useful pattern discovered from a rela-tional database by the proposed data mining technique. This is similar to defining a frequentk-item set, the support of which is larger than or equal to the user-specified minimum support, used in the apriori algo-rithm. It is noted that, if there existd attributes of a database, then the dimensions of a fuzzy grid is at mostd.

B. Data Structures and Tabular Operations

Significantly, we employ the tabular FGTTFS to generate frequent fuzzy grids, consisting of the following substructures:

1) Fuzzy grids table (FG): each row represents a fuzzy grid, and each column represents a 1-D fuzzy grid denoted byAx_K;i . 2) Transaction table (TT): each column representstp, and each

ele-ment records the membership degree to whicht_pbelongs to the corresponding fuzzy grid.

3) Column FS: stores the fuzzy support corresponding to the fuzzy grid in FG.

An example of an initial tabular FGTTFS is shown in Table I, from which we can see that there are two tuples,t1andt2, and two attributes,

x1andx2, in a database relation. Bothx1andx2are divided into three fuzzy partitions. Since each row of FG is a bit string consisting of 0 and 1,FG[u] and FG[v] (i.e., the uth row and the vth row of FG) can be paired to generate desired results by applying the Boolean operations.

Example: If we apply the OR operation on two rows,

FG[1] = (1; 0; 0; 0; 0; 0) and FG[4] = (0; 0; 0; 1; 0; 0), then (FG[1] OR FG[4]) = (1; 0; 0; 1; 0; 0) corresponding to a

can-didate 2-D fuzzy grid (i.e., Ax_3;1 ^ Ax_3;1) is generated. Then,

FS(Ax

3;1 ^ Ax3;1) = TT[1] 1 TT[4] = [x3;1(t1 ) 1 x3;1(t1 ) + x

3;1(t2 ) 1 x3;1(t2 )]=2 is obtained for comparison with the min

FS. IfAx_3;1^ Ax_3;1is large, then corresponding data (i.e., FG[1] OR FG[4],TT₃[1] 1 TT₃[4], and FS(Ax_3;1^ Ax_3;1) will be inserted into corresponding data structures (i.e., FG, TT, and FS).

In the well-known apriori algorithm, two frequent(k 0 1) item sets are joined to be a candidatek- item set (3 k d), and these two frequent item sets share(k 0 2) items. Similarly, a candidate k-dimen-sional fuzzy grid is also derived by joining two frequent (k01)-dimen-sional fuzzy grids, and these two frequent grids share(k02) linguistic values. We define that, if any two frequent(k 0 1)-dimensional fuzzy grids share(k 0 2) linguistic values, then there exists the same (k 0 2) linguistic values in those two fuzzy grids.

Example: IfAx_3;2^ Ax_3;1andAx_3;2^ Ax_3;3are two frequent fuzzy grids, then we can useAx_3;2^ Ax_3;1andAx_3;2^ Ax_3;3to generate the

(4)

candidate three–dimensional (3-D) fuzzy gridAx_3;2^ Ax_3;1^ Ax_3;3 be-cause bothAx_3;2Âx_3;1andAx_3;2Âx_3;3share the linguistic valueAx_3;2. However,Ax_3;2Âx_3;1Âx_3;3can also be constructed fromAx_3;2Âx_3;1 andAx_3;1^ Ax_3;3. This means that we must ensure that no extra con-structions of a candidate fuzzy grid are made.

To cope with this problem, the method we adopt here is that if there exist k integers e1; e2; . . . ; ek01; ek where

1 e1 < e2 < 1 1 1 < ek01 < ek d, such that FG[u; e1] = FG[u; e2] = 1 1 1 = FG[u; ek02] = FG[u; ek01] = 1

andFG[v; e1] = FG[v; e2] = 1 1 1 = FG[v; ek02] = FG[v; ek] = 1,

where FG[u] and FG[v] correspond to large (k 0 1)-dimen-sional fuzzy grids; then FG[u] and FG[v] can be paired to generate a candidate k-dimensional fuzzy grid. However, it should be noted that any two linguistic values defined in the same linguistic variable (i.e., attribute) cannot be constructed into a fuzzy grid. For example, since FG[1] OR FG[2] =

(1; 0; 0; 0; 0; 0) OR (0; 1; 0; 0; 0; 0); (1; 1; 0; 0; 0; 0) is thus invalid.

Therefore, (1; 0; 1; 0; 0; 0); (0; 0; 0; 1; 1; 0) and (0; 0; 0; 1; 0; 1) are invalid.

C. Determine Necessary Patterns for Learning

We consider that it is not necessary for decision makers to learn all useful patterns. We previously proposed a heuristic method [20] to determine which frequent fuzzy grids are necessary, and this method is incorporated into the proposed data mining technique. For any two fre-quent fuzzy grids, e.g.,Ax_{K ;i} Âx_{K ;i} ^1 1 1Âx_K _;i Âx_{K ;i} and Ax_{K ;i} ^ Ax_{K ;i} ^ 1 1 1 ^ Ax_K _;i ^ Ax_{K ;i} ^ Ax_K _;i , since _A _Â _^111Â _Â (c_r) _A _Â _^111Â _Â (cr) from (9), AxK ;i ^ Ax K ;i ^ 1 1 1 ^ AxK ;i ^ AxK ;i ^ AxK ;i Ax K ;i ^ AxK ;i ^ 1 1 1 ^ AxK ;i ^ AxK ;i thus holds.

For two frequent fuzzy grids, e.g., Fi and Fj (i 6= j), if Fj is contained in F_i (i.e., F_j F_i), then F_j is necessary and F_i is unnecessary. That is, if Fj is acquired by decision makers, then it is unnecessary for them to learn Fi. In addition,Fi is not selected for generating learning sequences. For example, in comparison with

AAge3;2 ^ ANumcars2;2 ^ AIncome3;2 ; AAge3;2 ^ ANumcars2;2 is unnecessary

for decision makers to learn since AAge_3;2 ^ ANumcars_2;2 ^ AIncome_3;2 is contained inAAge_3;2 ^ ANumcars_2;2 .

We now briefly describe the algorithm for finding frequent and nec-essary fuzzy grids.

Algorithm: Algorithm for Finding Frequent and Necessary Fuzzy Grids Input: a: A relational database

b. User-specified minimum fuzzy support Output: Frequent and necessary fuzzy grids Method:

Step 1. Fuzzy partitioning in each attribute.

Step 2. Scan the database, and then construct the initial table FGTTFS.

Step 3. Generate frequent fuzzy grids. 3-1: Generate frequent 1-D fuzzy grids

Set and eliminate the rows of initial FGTTFS corresponding to candidate 1-D fuzzy grids that are not frequent.

3-2: Generate frequent -dimensional fuzzy grids Set to .

For any two unpaired rows, and , corre-sponding to

frequent -dim fuzzy grids do

3-2-1. From corresponding to a candidate -dimensional fuzzy grid ,

if any two linguistic values are defined in the same linguistic variable (i.e.,

attribute), then discard and skip Steps 3-2-2, 3-2-3, and 3-2-4. That is, is invalid.

3-2-2. If and do not share linguistic values, then discard and

skip Steps 3-2-3 and 3-2-4. That is, is invalid. 3-2-3. If there exist integers

such that

, then compute and

the fuzzy support of .

3-2-4. Add ) to FG, to

TT and the fuzzy support to FS

when fs is not smaller than the minimum fuzzy support; otherwise, discard . End

3-3. Check whether any frequent -dimensional fuzzy grid is generated or not If any frequent -dimensional fuzzy grid is generated, then go to Step 3-2. It is noted that the

final FGTTFS stores only frequent fuzzy grids. Step 4. Find necessary fuzzy grids

For any two rows, and , corresponding to frequent

fuzzy grids do

If , then frequent fuzzy grids corresponding to is unnecessary, and is eliminated; else, is eliminated. End

Since the set of frequent and necessary fuzzy grids is viewed as a competence set for solving one decision problem, we try to use a method proposed by Feng and Yu [6] to expand it. This effective method is introduced in Section IV.

IV. COMPETENCESETEXPANSION

As we have mentioned earlier, for each decision problemE, there is a competence set, denoted byCS(E), consisting of ideas, knowledge, information, and skills required for successfully solving the problem. In addition, there exists a skill set, denoted bySk(E), that has been ac-quired by decision makers. Roughly speaking,CS(E) is the union of the already acquired competence set (i.e.,Sk(E)) and the truly needed competence set denoted byTr(E). However, it seems that it is not suf-ficient to solveE only by Sk(E). That is, in order to solve problem E, decision makers must acquireTr(E)nSk(E) through the competence set expansion. A competence set expansion means we are trying to find an effective way to generate learning sequences of acquiring the needed skills so that the needed competence setTr(E)nSk(E) is obtained [6]. We depict the concept of the competence set expansion in Fig. 5. The shaded area shown in Fig. 5 is justTr(E)nSk(E). Since the set of the useful patterns discovered from a relational database can be viewed as a set of needed skills for solving one decision problem, e.g.,E, it is necessary to generate learning sequences of useful patterns. However, for simplicitySk(E) is not discussed in this paper. Previously, some methods for expanding the competence set were proposed, such as de-duction graphs by Li and Yu [19] and the minimum spanning table method by Feng and Yu [6].

A competence set expansion can be roughly regarded as a tree con-struction process. For example, Feng and Yu employed a directed graph with an expansion table to find a spanning tree. This procedure views each useful pattern in the competence set to be a node in a directed graph, and set the learning costc(y_i; y_j), which may be measured by time or money, in the directed path directly from nodeyito nodeyj. It is noted thatc(yi; yj) 6= c(yj; yi) usually holds. Then. in this graph

(5)

Fig. 5. Competence set “Expansion.”

we can find a spanning tree with minimum learning cost (i.e., min-imum spanning tree). An optimal expansion with minmin-imum cost is thus acquired from the minimum spanning tree. The starting node in the di-rected path is the pattern that we suggest decision makers learn first. The minimum spanning table method is especially powerful for the expansion of a set of single skills. Therefore, we adopt this powerful algorithm to generate a minimum spanning tree. We briefly introduce the minimum spanning table method as follows.

Algorithm: Minimum spanning table method

Input: A directed graph with an expansion table . The element of stores the

learning cost directly from node to node . Initially, no columns of are

crossed out and an integer number, , is set to zero.

Output: The minimum spanning table and corresponding spanning tree. Method:

Step 1. Selecting and marking procedure

Select the smallest element among the remaining not-crossed-out columns of expansion table , and mark it.

Step 2. Cycle detecting procedure

Determine whether a cycle has been formed among the marked elements; if so, go to Step 5.

Step 3. Crossing out procedure

Cross out the column to which the newly selected marked element belongs. Step 4. Stopping rule

If only one not-crossed-out column is left, then the minimum spanning table can

be constructed, and go to Step 6; otherwise, go to Step 1. Step 5. Compressing procedure

Compress the nodes in the detected cycle into a node . Define transformation equations as follows:

c(x; yi) = minfc(y; yi) j y 2 xg (11)

c(yi; x) = minfc(yi; y) + c(ys; yt) 0 c(y; y) j y 2 xg (12)

where is the largest cost in ; and is a marked element. Set to ,

and a new expansion table is constructed, and then go to Step 1. Step 6. Unfolding procedure

From the minimum spanning table of , the minimum spanning table of can be generated. Note that produced by the unfolding procedure is the min-imum

spanning table of .

The minimum spanning table method will be stopped in a finite number of steps, at which point an optimal expansion of the com-petence set can be acquired from a spanning tree with minimum learning cost. In following section, we present a numerical example to demonstrate its ability to help decision makers solve decision problems through the data mining technique and the competence set expansion.

V. NUMERICALEXAMPLE

In this section, a specified database relation EMP stores the basic data of customers for one supermarket, with five attributes and ten tu-plestp (1 p 10), as shown in Table II. For simplicity, some

columns or rows of the following tables are omitted, as denoted by “. . .”

Phase I. Find frequent and necessary fuzzy grids

• Fuzzy partitioning in each attribute

Since both “Age” and “Income” are quantitative attributes,

K = 3 is considered for these two attributes. Suppose the a

universe of discourse of “Age” is [0, 60], and that of Income is [15 000, 60 000], then there are six linguistic values (i.e., three for Age and three for Income) generated in the following:

Age: AAge

3;1: young, and below 30; AAge

3;2: close to 30; and between 0 to 60; AAge

3;3: old, and above 30:

Income: AIncome_3;1 : low, and below 37 500;

AIncome

3;2 : close to 37500; and between 15 000

and60 000;

AIncome

3;3 : high, and above 37 500:

On the other hand, “Married,” “Numcars,” and “Career” are cegorical attributes. Various linguistic values defined in each at-tribute are described as follows:

Married: AMarried_2;1 : yes; AMarried_2;2 : no:

Numcars: AMarried_2;1 : zero Car; ANumcars_2;1 : one car: Career: ACareer_4;1 : student; ACareer_4;2 : teacher;

ACareer

4;3 : engineer; ACareer4;4 : business person:

• Find frequent 1-D fuzzy grids

After scanning EMP, we can construct the initial FCTTFS. As-suming that the user-specified min FS is 0.3, then we can find frequent 1-D fuzzy grids by deleting the infrequent fuzzy grids from the initial FCTTFS. We reconstruct FCTTFS, as shown in Table III(a) and (b).

Now, rows 1, 2, 3, 4, 5, 6, and 7

cor-respond to frequent 1-D fuzzy grids

AAge

3;2; AMarried2;1 ; AMarried2;2 ; A2;2Numcars; AIncome3;2 ; ACareer4;3 ; and ACareer

4;4 ; respectively. We can find that

(FC[2] OR FC[3]) = (0; 1; 1; 0; 0; 0; 0), which corresponds

to the candidate 2-D fuzzy gridc = AMarried_2;1 2 AMarried_2;2 is generated. However, c will be discarded since both AMarried_2;1 and AMarried_2;2 are linguistic values of the linguistic variable “Married.”

We select FCTTFS[1] and FCTTFS[2] as an example to show how a frequent 2-D fuzzy grid is generated. Clearly,

FC[1] and TT[1] corresponding to AAge

3;2 are (1, 0, 0, 0, 0,

0, 0) and (0.6333, 0.8333, 0.7667, 0.90, 0.50, 0.1333, 0.60, 0.6667, 0.90, 0.8333), respectively. In addition, FC[2] and TT[2] corresponding to AMarried_2;1 are (0, 1, 0, 0, 0, 0, 0) and (0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0), respectively.

(FC[1] OR FC[2]) = (1; 1; 0; 0; 0; 0; 0) corresponding to

a candidate 2-D fuzzy grid (i.e., c = AAge_3;2 ^ AMarried_2;1 ) is generated. Then, we calculate (TT[1] 1 TT[2]) =

(0:0; 0:8333; 0:0; 0:90; 0:50; 0:1333; 0:0; 0:0; 0:90; 0:8333)

and the fuzzy support FS(c) = 0:410 is computed. Since

(6)

TABLE II RELATIONEMP

TABLE III

(a) FREQUENT1-DIMFUZZYGRIDS INTABLEFC. (b) FREQUENT1-DIMFUZZYGRIDS INTABLETTFC

(a)

(b)

frequent 2-D fuzzy grids that can be inserted into the table FCTTFS are shown in Table IV(a) and (b).

In this step, six frequent 3-D fuzzy grids can be further dis-covered from those frequent 2-D fuzzy grids. These frequent 3-D fuzzy grids areAAge_3;2 ÂMarried_2;1 ÂNumcars_2;2 ; AAge_3;2 ÂNumcars_2;2 ^

AIncome

3;2 ; AAge3;2 ^ ANumcars2;2 ^ ACareer4;3 ; AMarried2;1 ^ ANumcars2;2 ^ AIncome

3;2 ; AMarried2;1 ÂNumcars2;2 ÂCareer4;4 ; ANumcars2;2 ÂIncome3;2 ^ ACareer

4;3 . The corresponding FCTTFS are omitted here for

sim-plicity.

• Find necessary patterns for finding learning sequences We can observe that there is no frequent four-dimensional (4-D) fuzzy grid that can be generated. Thus, we must find necessary patterns from frequent 1-D, 2-D, and 3-D fuzzy grids. It is easy to find six frequent and necessary fuzzy grids, consisting of AAge_3;2 ^ AMarried_2;1 ^ ANumcars_2;2 ; AAge_3;2 ^

ANumcars

2;2 ^ AIncome3;2 ; AAge3;2 ^ ANumcars2;2 ^ ACareer4;3 ; AMarried2;1 ^ ANumcars

2;2 ^ AIncome3;2 ; AMarried2;1 ^ ANumcars2;2 ^ ACareer4;4 , and ANumcars

2;2 ^ AIncome3;2 ^ ACareer4;3 . Phase II. Competence set expansion

Next, we employ the minimum spanning table method to expand the competence set for necessary and frequent fuzzy grids discovered from Phase I. At first, each necessary fuzzy grid corresponding to a node in the digraph is shown in Table V. An expansion tableT0is constructed as Table VI with hypothesis learning cost.

Finally,ST0 ofT0 is shown in both the alternatives of Tables VII and VIII. The minimum spanning trees corresponding to Tables VI and VIII are shown in Figs. 6 and 7, respectively. Clearly, the minimum learning cost is 2.7245. Learning sequences with minimum learning cost can be directly acquired by the minimum spanning trees. For ex-ample, the learning sequence shown in Fig. 6 represents thatAMarried_2;1 ^

ANumcars

2;2 ^ ACareer4;4 (i.e.,5) is the useful pattern that we suggest de-cision makers to learn first.AMarried_2;1 ^ ANumcars_2;2 ^ AIncome_3;2 (i.e.,4) are learned afterAMarried_2;1 ^ ANumcars_2;2 ^ ACareer_4;4 have been learned. Subsequently,ANumcars_2;2 ^ AIncome_3;2 ^ ACareer_4;3 (i.e.,6) are suggested to be learned. The main difference between Figs. 6 and 7 is thatAAge_3;2 ^

ANumcars

2;2 ^ AIncome3;2 (i.e.,2) andAAge_3;2 ^ ANumcars_2;2 ^ ACareer_4;3 (i.e.,

3) are suggested to be learned simultaneously in Fig. 7. Since learning

sequences are not unique, decision makers will subjectively select one of the learning sequences to acquire the competence set.

(7)

TABLE IV

(a) FREQUENT2-DIMFUZZYGRIDS INTABLEFC. (b) FREQUENT2-DIMFUZZYGRIDS INTABLETTFC

(a)

(b)

TABLE V

EACHFREQUENT ANDNECESSARYFUZZYGRIDCORRESPONDS TO ANODE

TABLE VI EXPANSIONTABLET

TABLE VII

SELECTc( ; )ANDc( ; ),ANDTHENCONSTRUCTST OFT

TABLE VIII

SELECTc( ; )ANDc( ; ),ANDTHENCONSTRUCTST OFT

(8)

Fig. 7. Minimum spanning tree corresponds to Table VIII.

VI. DISCUSSIONS ANDCONCLUSIONS

The primary contribution of this paper is to present a useful tool to support decision making through the data mining technique and the competence set expansion. From simulation results of the numerical example, we can see that it is possible to help decision makers to con-fidently solve decision problems using the data mining technique and the competence set expansion. Significantly, this is a starting point for integrating data mining techniques with the expansion of the compe-tence set. Each data mining technique, such as clustering, can discover its particular type of useful patterns, which can also be viewed as a com-petence set for solving one decision problem. Then, the optimal expan-sion of the competence set with minimum learning cost becomes quite important. In this paper, we first use the proposed data mining tech-nique to find the necessary patterns from a database. Then, we use the minimum spanning table method to optimally expand a needed com-petence set. For the combination of various techniques of data mining and the competence set expansion, we will study the feasibility and ef-fectiveness.

In fact, the meaning of the linguistic values of quantitative attribute

xmcan be changed by a linguistic hedge [9], [10], such as “very” or “more or less.” For example

veryAx_K;i = Ax_K;i 2 (13)

more or lessAx_K;i = Ax_K;i 1=2: (14) The membership functions of (Ax_K;i )2 and (Ax_K;i )1=2 are

[x

K;i (x)]2and[xK;i (x)]1=2, respectively. The use of the linguistic

hedge will make the frequent fuzzy grids discovered from a database more friendly and more flexible for decision makers. However, the number of linguistic values of each attribute may be available from domain experts.

ACKNOWLEDGMENT

The authors are very grateful to the anonymous referees for their valuable comments and constructive suggestions.

REFERENCES

[1] M. Berry and G. Linoff, Data Mining Techniques: For Marketing, Sales, and Customer Support. New York: Wiley, 1997.

[2] S. Myra, “Web usage mining for web site evaluation,” Commun. ACM, vol. 43, no. 8, pp. 127–134, 2000.

[3] P. L. Yu and D. Zhang, “A foundation for competence set analysis,” Math. Social Sci., vol. 20, pp. 251–299, 1990.

[4] M. J. Hwang, C. I. Chiang, I. C. Chiu, and G. H. Tzeng, “Multistages op-timal expansion of competence sets in fuzzy environment,” Int. J. Fuzzy Syst., vol. 3, no. 3, pp. 486–492, 2001.

[5] Y. C. Hu, R. S. Chen, and G. H. Tzeng, Discovering fuzzy association rules using fuzzy partition methods, in Knowl.-Based Syst.. (accepted). [6] J. W. Feng and P. L. Yu, “Minimum spanning table and optimal ex-pansion of competence set,” J. Optim. Theory Appl., vol. 99, no. 3, pp. 655–679, 1998.

[7] L. A. Zadeh, “Fuzzy sets,” Inf. Control., vol. 8, no. 3, pp. 338–353, 1965. [8] , “The concept of a linguistic variable and its application to approx-imate reasoning,” Inf. Sci., pt. 1, vol. 8, no. 3, pp. 199–249, 1975. (part 2) vol. 8, no. 4, pp. 301–357, 1975; (part 3) vol. 9, no. 1, pp. 43–80, 1976. [9] H. J. Zimmermann, Fuzzy Set Theory and Its Applications. Norwell,

MA: Kluwer, 1991.

[10] S. M. Chen and W. T. Jong, “Fuzzy query translation for relational data-base systems,” IEEE Trans. Syst., Man, Cybern. B, vol. 27, pp. 714–721, Aug. 1997.

[11] J. W. Han and M. Kamber, Data Mining: Concepts and Tech-niques. San Mateo, CA: Morgan Kaufmann, 2001.

[12] H. Ishibuchi, K. Nozaki, N. Yamamoto, and H. Tanaka, “Selecting fuzzy if-then rules for classification problems using genetic algorithms,” IEEE Trans. Fuzzy Syst., vol. 3, pp. 260–270, Aug. 1995.

[13] H. Ishibuchi, T. Nakashima, and T. Murata, “Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems,” IEEE Trans. Syst., Man, Cybern. B, vol. 29, pp. 601–618, Oct. 1999.

[14] Y. C. Hu, R. S. Chen, and G. H. Tzeng, “Mining fuzzy association rules for classification problems,” Comput. Ind. Eng., to be published. [15] L. X. Wang and J. M. Mendel, “Generating fuzzy rules by learning

from examples,” IEEE Trans. Syst., Man, Cybern., vol. 22, no. 6, pp. 1414–1427, 1992.

[16] C. T. Sun, “Rule-base structure identification in an adaptive-network-based fuzzy inference system,” IEEE Trans. Fuzzy Syst., vol. 2, no. 1, pp. 64–73, 1994.

[17] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algo-rithms. New York: Plenum, 1981.

[18] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo, “Fast discovery of association rules,” in Advances in Knowledge Dis-covery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Eds: AAAI Press, 1995, pp. 307–328.

[19] H. L. Li and P. L. Yu, “Optimal competence set expansion using deduc-tion graph,” J. Optim. Theory Appl., vol. 80, no. 1, pp. 75–91, 1994. [20] Y. C. Hu, G. H. Tzeng, and R. S. Chen, “Discovering fuzzy concepts for

expanding competence set,” in Proc. 2nd Int. Symp. Advanced Intelligent Systems, Daejeon, Korea, 2001, pp. 396–401.

[21] W. Pedrycz and F. Gomide, An Introduction to Fuzzy Sets: Analysis and Design. Cambridge, MA: MIT Press, 1998.

[22] P. Adriaans and D. Zantinge, Data Mining. Reading, MA: Addison-Wesley, 1996.

[23] J. M. Li, C. I. Chiang, and P. L. Yu, “Optimal multiple stage expansion of competence set,” Eur. J. Oper. Res., vol. 120, no. 3, pp. 511–524, 2000. [24] P. L. Yu, Forming Winning Strategies: An Integrated Theory of Habitual

Domains. New York: Springer-Verlag, 1990.

Web Newspaper Layout Optimization Using Simulated Annealing

Jesús González, Ignacio Rojas, Héctor Pomares, Moisés Salmerón, and Juan Julián Merelo

Abstract—The web newspaper pagination problem consists of opti-mizing the layout of a set of articles extracted from several web newspapers and sending it to the user as the result of a previous query. This layout should be organized in columns, as in real newspapers, and should be adapted to the client web browser configuration in real time. This paper presents an approach to the problem based on simulated annealing (SA) that solves the problem on-line, adapts itself to the client’s computer configuration, and supports articles with different widths.

Index Terms—Greedy algorithm, pagination, real-time optimization.

I. INTRODUCTION

Since the amount of information available on the Internet is growing day by day, when a user sends a query to a news site, a lot of information Manuscript received March 12, 2000; revised January 8, 2002. This work was supported in part by the Spanish CICYT Project DPI2001-3219. This paper was recommended by Associate Editor C. Hsu.

The authors are with the Department of Computer Architecture and Computer Technology, E.T.S. Ingeniería Informática, University of Granada, E-18071 Granada, Spain.