Grey self-organizing feature maps

(1)

Grey self-organizing feature maps

Yi-Chung Hu

a

_{, Ruey-Shun Chen}

a

_{, Yen-Tseng Hsu}

b

_,

Gwo-Hshiung Tzeng

c;∗

a_{Institute of Information Management, National Chiao Tung University, Hsinchu, Taiwan, ROC} b_{Department of Electrical Engineering, National Taiwan University of Science and Technology,}

Taipei, Taiwan, ROC

c_{Institute of Management of Technology, National Chiao Tung University, Hsinchu, Taiwan, ROC} Received 10 September 2000; accepted 13 August 2001

Abstract

In each training iteration of the self-organizing feature maps (SOFM), the adjustable output nodes can be determined by the neighborhood size ofthe winning node. However, it seems that the SOFM ignores some important information, which is the relationships that actually exist between the input training data and each adjustable output node, in the learning rule. By viewing input data and each adjustable node as a reference sequence and a comparative sequence, respectively, the grey relations between these sequences can be seen. This paper thus incorporates the grey relational coe8cient into the learning rule ofthe SOFM, and a grey clustering method, namely the GSOFM, is proposed. From the simulation results, we can see that the best result ofthe proposed method applied for analysis ofthe iris data outperforms those ofother known unsupervised neural network models. Furthermore, the proposed method can e:ectively solve the traveling salesman problem. c 2002 Elsevier Science B.V. All rights reserved.

Keywords: Self-organizing feature maps; Grey relation; Grey clustering; Traveling salesman problem

1. Introduction

Kohonen originally proposed the self-organizing feature maps (SOFM) learning algorithm in 1984 [19], and since then it has served as a powerful tool for a

∗_{Corresponding author. Fax: +886-3-5753926.}

E-mail address: [email protected] (G.-H. Tzeng).

(2)

Kohonen Layer

x1 x2

Input Layer Output nodes

Fig. 1. Basic model ofSOFM.

variety ofapplications, including problem solving for pattern recognition

and image processing. The SOFM can map the distribution ofinput data with any number ofdimensions to a one- or two-dimensional feature map graph, preserving the statistical properties ofthe data distribution [16,17,3]. Further-more, each output node ofthe SOFM is restricted to a smaller distance around the cluster center in the cluster analysis [3]. SigniGcantly, this paper demonstrates that the problem-solving capability ofthe SOFM can be enhanced by incorporating grey relations, previously proposed by Deng [10], into the SOFM.

We show the basic model ofSOFM in Fig. 1, indicating that there are two layers in the model: one is the Kohonen layer, consisting ofmultiple output nodes with one- or two-dimensions; and the other is the input layer. Both lay-ers are fully connected and each connection is given an adjustable weight. Let the number ofthe output nodes to be m, the number ofthe input nodes be n, and wi= (wi1; wi2; : : : ; win)(1 ≤ i ≤ m) be the connection weight vector cor-responding to the output node i. Thus, wi can be viewed as the center ofthe cluster i.

Whenever new input training data x = (x1; x2; : : : ; xn) is presented to the network during the training phase ofthe SOFM, the output value ofthe output node i can be obtained by computing the square ofthe Euclidean distance denoted by oi between x and wi, as:

oi= (di)2₌_x₋_wi2₌n j=1

(xj−wij)2_{; 1}_≤_i_≤_m: ₍₁₎

Ifthe node i∗ _{satisGes Eq. (2) then it is the winner.} (di∗)2= min

i oi; 1≤i≤m: (2)

Adjustable output nodes including the winning node i∗ _{and its neighbor nodes are} determined by the neighborhood size ofthe winning node i∗_{, which can be denoted} by Li∗. Subsequently, connection weights ofthe adjustable nodes are all updated. The learning rule ofthe SOFM is as follows [19,16,17]:

(3)

where is the learning rate. To achieve a better convergence, and Li∗ should be decreased gradually with learning time [17,22]. After su8cient training time, the SOFM can map the distribution ofinput data with any dimensions to the Kohonen feature maps.

By inspecting Eq. (3), we can see that the movable amount is determined

only by the learning rate and the di:erence between xj and wij. However,

it seems that the SOFM ignores some important information, which is the relationships that actually exist between the input training data and each adjustable output node, in the learning rule. Indeed, there exist distinct relationships between any two subsystems in the real world [10,24], although we do not know exactly what these relationships are. Grey theory, as proposed by Deng [10], can perform grey relational analysis for these subsystems by dealing with Gnite and incomplete output data series obtained from these subsystems [15]. Given one reference sequence, for example x, and some comparative sequences, for example wi(1 ≤ i ≤ m), we can easily obtain the grey relation between each corresponding data in these sequences by viewing the reference sequence as a desired goal. Therefore, we consider that the learning rule should take into

account the grey relation which actually exists between wij and xj. Such a

signiGcant relation is called the grey relational coe8cient (GRC).

The connection weight wij can thus acquire more movement ifthere exists a

larger GRC between wij and xj. This paper incorporates the GRC into the learning rule of the SOFM, and we refer to this novel combination as grey self-organizing feature maps (GSOFM), which can thus be viewed as a grey clustering method. This is the main di:erence between the original SOFM and the GSOFM.

In the following sections, we Grst review concepts of the GRC and

describe how to compute the GRC between xj and wij in Section 2. In

Section 3, we describe in detail the GSOFM learning algorithm. To show the problem-solving capability ofthe GSOFM, in Section 4, the performances are examined by computer simulations on two representative problems: one is the classiGcation problems, including the iris data proposed by Fisher [11], the appendicitis data and the wine recognition data; the second is the traveling salesman problems (TSP). In the Grst simulation, we compare the best result ofthe GSOFM with that ofthe SOFM in each problem. Moreover, the best re-sult ofthe GSOFM with respect to the iris data is compared with other known unsupervised neural networks models. For applying the neural network with unsupervised learning to classiGcation problems, the summarized results can demonstrate the e:ectiveness and feasibility of the GSOFM. In the latter simulation, we Grst brieNy introduce the TSP. Since it seems that the learning algorithm introduced in Section 3 simpliGes the mechanism for lateral feedback, we incorporate the neighborhood function into the learning rule. A complete

learn-ing algorithm ofthe GSOFM for solving the TSP is described in

Section 4.2. Subsequently, we apply the proposed method on the TSPLIB problem set proposed by Reinelt [23] to show the e:ectiveness ofthe GSOFM.

(4)

2. Grey relational coecient (GRC)

Grey relational analysis is a method that can Gnd the relationships between one major sequence and the other sequences in a given system [14]. Given the reference sequence x = (x1; x2; : : : ; xn) and the comparative sequences wi= (wi1; wi2; : : : ; win) (1≤i≤m) with the normalized form, the GRC ij between xj and wij(1≤j≤n) can be computed as [24,15,14,8]

ij= M_Mmin+ Mmax

ij+ Mmax ; (4)

where is the discriminative coe8cient (0≤ ≤1), and usually = 0:5 [14,8]; and

Mmin= min_i min_j |xj−wij|; 1≤i≤m; 1≤j≤n; (5)

Mmax= max_i max_j |xj−wij|; 1≤i≤m; 1≤j≤n; (6)

Mij=|xj−wij|; (7)

where|·|denotes the absolute value. Clearly, ij is between zero to one. Moreover, ij approaches one ifMij is near Mmin. That is, the larger degree ofrelationship that exists between xj and wij, the more movement should be acquired for moving wij toward xj. Thus ij is incorporated into the learning rule ofthe SOFM. We should note that the appropriate value of is actually dependent on individual applications.

Unlike correlation analysis, which only stresses the relationship between any two random variables, grey relational analysis tries to Gnd the relationships between one reference sequence and other comparative sequences by viewing the reference sequence as a desired goal that each comparative sequence expects to attain. In the following section, the learning algorithm for the grey self-organizing feature maps is introduced.

3. Grey self-organizing feature maps (GSOFM)

The learning algorithm ofthe GSOFM is categorized as unsupervised learning, that is we need not know the desired output ofeach training data during the training phase. Before training, we usually normalize all the input data and weight vectors [17]. Similar to the SOFM, the training phase in GSOFM is typically composed of the ordering phase and the convergence phase [17,22]. Initially, should be chosen close to 1.0. Moreover, Li should cover all output nodes. During the ordering phase, will gradually decrease but not below 0.1. Li will also decrease slowly, as depicted in Fig. 2 [17,22] where t1 and t2 are the number ofiterations and 0 ¡ t1¡ t2. At the end ofthis phase, both and Li will achieve much smaller values, and they continue to decrease during the convergence phase. In principle,

(5)

Winning node

Λi(t2) Λi(0) Λi(t1)

Fig. 2. Neighborhood size ofthe winning node gradually decrease with time.

will not be decreased below a given value, say 0.05, and Li will decrease to only cover itselfduring the training phase. It should be noted that both and Li(1≤i ≤m) are decreased at the end ofeach iteration or each complete pass (i.e. each training data has been presented to the network).

As we have stated in the previous section, x = (x1; x2; : : : ; xn) and wi= (wi1; wi2; : : : ; win)(1≤i ≤m) are the reference sequence and the comparative se-quences, respectively. Note that, the value of m serving as the number ofclusters must be speciGed before the training task is performed. SigniGcantly, the learning rule ofthe GSOFM is as follows:

Mwij= (ij)k_(xj₋_{wij); i}_∈_Li_∗_{; 1}_≤_j_≤_n; ₍₈₎

where k is a pre-speciGed positive real number, and ij is the GRC between

xj and wij. This implies that if ij is much smaller, then the value of(ij)k will approach zero when k is a larger value. On the other hand, ij will be dampened by a much larger value of k. SigniGcantly, the connection weight wij could acquire a large amount ofmovement ifthere exists a larger GRC

be-tween wij and xj. We describe the learning algorithm ofthe GSOFM as the

following.

Algorithm : Grey self-organizing feature maps learning algorithm Input: A given set oftraining data.

Output: The center ofeach cluster. Method:

Step 1: Initialize connection weights and parameters

a. Initialize weights corresponding to each output node with random small values;

b. Initialize (0) and the number ofneighbor nodes Li(0) ofnode i: (0) should approach 1.0, and Li(0) should cover all output nodes, 1≤i≤m;

c. Set t = 1, where t is an iteration counter. Step 2: Present input training data x(t)

(6)

Step 3: Calculate the output value oi(t) ofeach output node i oi(t) = (di(t))2₌n

j=1

(wij(t)−xj(t))2_{; 1}_≤_i_≤_m: Step 4: Determine the winning node i∗

The node i∗ _{is the winner if} (di∗(t))2= min

i oi(t); 1≤i≤m:

Step 5: Adjust the winning nodes i∗ _{and its neighbor nodes}

a. The neighbor nodes around the winning node i∗ _{can be determined} by Li∗(t);

b. The learning rule based on ij(t) can be given as Eq. (8)

wij(t + 1) = wij(t) + (t)[ij(t)]k[xj(t)−wij(t)]; i∈Li∗(t); 1≤j≤n; where k is a pre-speciGed positive real number, and ij(t) is the GRC between xj(t) and wij(t). Ifeach training data is presented to the network, then go to Step 6; otherwise go to Step 2.

Step 6: Shrink the learning rate (t) and the neighborhood size Li(t)

(t) and Li(t) may shrink gradually with linear or exponential time, where 1≤i≤m:

Step 7: Convergence test

Ifthe winning node ofeach input data is not changed then stop. Other-wise, set t + 1 to t and go to Step 2.

To achieve the convergence, empirically, many thousands ofiterations for the GSOFM are necessary. We can see that the learning rule ofthe GSOFM is not determined only by the learning rate and the di:erence of wij(t) and xj(t). To show its e:ectiveness, we apply the GSOFM for two representative problems: one is the classiGcation problem, including the iris data proposed by Fisher [11], the appendicitis data and the wine recognition data; the second is the TSP. Simulations with speciGed parameter speciGcations are described in Section 4.

4. Simulations

To examine the performance of the GSOFM, we Grst employ it to obtain classi-Gcation rates on the well-known data including the iris data, the appendicitis data and the wine recognition data. Subsequently, we show that the GSOFM can ef-fectively solve the TSP in comparison with other known neural network models. All programs coded by Delphi version 5.0 were executed by a personal computer with Pentium III-500 CPU. It should be noted that we stress the feasibility and the problem-solving capability ofthe GSOFM, rather than providing formal methods to Gnd general parameter speciGcations that can optimize problems.

(7)

Table 1

The best result (96.00%) obtained by the GSOFM for the iris data with various k versus

k k 0.02 6.30, 6.38 0.14 8.83, 9.12 0.03 5.23 0.16 9.44, 9.45, 9.58, 9.64, 9.67 0.07 7.78, 8.01, 8.42 0.17 8.13, 9.51 0.08 7.71 0.21 8.06 0.09 5.41, 7.76, 8.84, 8.87 0.23 6.62 0.10 7.98, 8.51, 9.18, 9.40 0.34 7.65 0.11 9.52 0.39 8.90 0.13 8.47, 9.09, 9.23, 9.24

4.1. Performance for classi5cation problems

We compare the best result ofthe GSOFM with that ofthe SOFM for each problem. Moreover, the best result ofthe GSOFM with respect to the iris data is compared with those ofother known unsupervised neural networks models. Good parameter speciGcations for suggesting the GSOFM to obtain the best result can be found through the following sections.

4.1.1. The iris data

The iris data consists ofthree classes (class 1: iris setosa; class 2: iris versicolor; class 3: iris virginica) and each class consists of Gfty data with four dimensions. Moreover, class 2 overlaps class 3.

The Kohonen layer is implemented by a one-dimensional array. Initial parameter speciGcation including m; and Li is described as follows:

m = 3; (0) = 1:0;

Li(0) = 2; 1≤i≤m;

During the training phase, is gradually decreased by a much smaller and Gxed amount (i.e., 0.005) at the end ofeach iteration. Actually, will not be decreased below a given value (i.e., 0.05). Similar to Fig. 2, Li is gradually decreased after each ofthe 100 iterations are executed. In decreasing both and Li, we follow the principles described in Section 3. We examine the performance by the misclassiGed number through various k versus (i.e., 0≤k≤10; 0:0≤ ≤1:0), and the best result that the GSOFM can attain is 96.00% (i.e., misclassiGed number is 6). Simulation results are summarized in Table 1. From this table, we can see that the best result can be obtained for ¡ 0:4 by carefully tuning parameters.

Next, we compare the best result ofthe GSOFM with that ofother known neural network models that have been applied on the iris data. These models include the generalized learning vector quantization (GLVQ) [20], the unsupervised fuzzy competitive learning (UFCL) [21], the soft competition scheme (SCS) [6], and the descending fuzzy learning vector quantization (↓FLVQ) [6]. The fuzzy c-means

(8)

Table 2

Compare best result ofthe GSOFM with those ofother known unsupervised neural networks GSOFM (%) SOFM (%) LVQ (%) GLVQ (%) ↓FLVQ (%) UFCL (%) SCS (%) FCM (%)

96.00 88.00 89.33 88.67 88.67 90.00 89.33 91.33

(FCM) [21,7] is also taken into account. From Table 2, we can see that the best result ofthe GSOFM is superior to those ofother unsupervised neural network models.

4.1.2. The appendicitis data

The appendicitis data consists of106 cases classiGed into two classes with seven attributes. Initial parameter speciGcation including m; and Li is described as follows:

m = 2; (0) = 1:0;

Li(0) = 1; 1≤i≤m;

The method for decreasing both and Li are the same as that used in Section 4.1.1. By carefully tuning values of k and (i.e., 0≤k ≤10; 0:0≤ ≤1:0), the best result that the GSOFM can attain is 86.79% (i.e., misclassiGed number is 14). We also Gnd that the best result is obtained only when = 0:07. On the other hand, the best result for the SOFM is 78.30% (i.e., misclassiGed number is 23), clearly worse than that ofthe GSOFM.

4.1.3. The wine recognition data

The wine recognition data, which are the results ofa chemical analysis ofthree types ofwines, consists of178 cases classiGed into three classes with 13 con-tinuous attributes. Initial parameter speciGcations, including m; and Li, and the corresponding decreasing method are used as those described in Section 4.1.2. Us-ing the SOFM, we Gnd the classiGcation result is 92.13% (i.e., misclassiGed number is 14). By carefully tuning values of k and (i.e., 0 ≤k ≤10; 0:0 ≤ ≤1:0), the best result ofthe GSOFM is 96.63% (i.e., misclassiGed number is 6). We also Gnd that the best result is obtained only when = 0:02. From the viewpoint ofthe best classiGcation capability, the GSOFM again outperforms the SFOFM.

From the simulation results, we can see that the best classiGcation capability of the SOFM could be enhanced by incorporating grey relations into the learning rule. For applying the neural networks to classiGcation problems, simulation results can thus demonstrate the e:ectiveness and the feasibility of the GSOFM.

4.2. Performance for the traveling salesman problem (TSP)

The TSP can be stated as follows: “Given N cities, Gnd the shortest path for a salesman so that he can visit all the cities exactly once” [9]. TSP is a combinatorial

(9)

optimization problem and is known to be NP-complete [3]. In addition to the Grst successful neural model proposed by HopGeld and Tank [13], other neural network models for solving the TSP have been proposed. Some approaches have been well surveyed and simulated by Aras et al. [3], for example, the guilty net (GN) by Burke and Damany [5], Angeniol et al.’s method (AVL) by Angeniol et al. [2], the KNIES-TSP (KL), and the KNIES-TSP-Global (KG) by Aras et al. [3].

On the other hand, the SOFM can also be used to solve the TSP with various number ofoutput nodes through trial and error. The SOFM could give us a near optimal solution [9]. However, the quality ofthe solution depends on the number ofoutput nodes. Ifwe do not Gnd an acceptable path after su8ciently long time, then the path is not useful and extra output nodes are added. Since more than one node can be attracted to the same city, it is actually best to have more nodes than cities [12]. The number is usually 2, 3 or 4 times N. The number ofoutput nodes ofthe GSOFM are thus experimentally set to be three times ofthe number of cities.

In this section, we employ the GSOFM to solve the TSP to determine its e:ec-tiveness. However, it seems that poor results are obtained ifwe apply the learning algorithm presented in Section 3 to solve the TSP, since it simpliGes the lateral feedback mechanism [22]. Thus, it is necessary to incorporate the neighborhood function Ui∗, which is a type ofGaussian function, around the winning node i∗ as Eq. (9):

Ui∗= exp(−d_ii∗=); i∈L_i∗; 1≤i≤m (9)

into the learning rule. While, dii∗ is the cardinal distance [1] measured along the ring between the nodes i and i∗_:

dii∗= min{|i∗−i|; m− |i∗−i|}; (10)

where | · | denotes the absolute value, and i and i∗ _{are actually the labels of} the winning node and the node i, respectively. As for , it is called the “gain parameter” [3,1], reNecting the scope ofthe neighborhood [3] and it is decreased at the end ofeach complete pass by Eq. (11) [1]

(t + 1) = (t); (11)

where 0 ≤ ≤ 1. A detailed learning algorithm ofthe GSOFM for solving the

TSP is described as follows.

Algorithm : GSOFM learning algorithm for solving the TSP Input: Given N cities.

Output: Find the shortest path for a salesman so that he can visit all the cities exactly once.

Method:

Step 1: Initialize connection weights and parameters

a. Initialize weights corresponding to each output node with random small value;

(10)

b. Let (0) = 1:0, and Li(0) = 3(m−1), where 1≤i≤m, i.e., the total number ofoutput nodes is 3m;

c. Randomize the order ofcities and label cities 1; : : : ; N. The variable r indexes the order ofcity and set r = 1, where 1≤r≤N. In addition, we assign the label i to the node i, where 1≤i≤m;

d. Set t = 1, where t is an iteration counter. Step 2: Present the rth city x(r)_(t)

Step 3: Calculate the output value oi(t) ofeach output node i oi(t) = (di(t))2₌n

j=1

(wij(t)−x_j(r)(t))2_{; 1}_≤_i_≤_m Step 4: Determine the winning node i∗

The node i∗ _{is the winner if} (di∗(t))2= min

i oi(t); 1≤i≤m

Step 5: Adjust the winning nodes i∗ _{and its neighbor nodes}

a. The neighbor nodes around the winning node i∗ _{can be determined} by Li∗(t);

b. The learning rule based on ij(t) can be given as wij(t + 1) = wij(t) + (t)[ij(t)]kUi∗(t)[x(r)_j (t)−w_ij(t)];

i∈Li∗(t); 1≤j≤n (12)

where k is a pre-speciGed positive real number, and ij(t) is the GRC between xr

j(t) and wij(t) Step 6: Increment the value of r

If r equals to N, then

a. Shrink the gain parameter (t) as Eq. (11);

b. shrink the learning rate (t) and the neighborhood’s size Li(t); c. Set t + 1 to t.

Go to Step 7. Otherwise, set r + 1 to r and go to Step 2. Step 7: Convergence test

Checking whether or not locations ofoutput nodes are within an accept-able distance ofcities. Ifyes then stop. Otherwise, set t + 1 and 1 to t and r, respectively, and go to Step 2.

During the training phase, is gradually decreased by a much smaller and Gxed amount (i.e., 0.0005) at the end ofeach iteration. Actually, will not be decreased below a given value (i.e., 0.05). We employ the same methods described in Section 4.1.1 to gradually decrease Li during the training phase. For simplicity, we set = 0:5, which is commonly used in other applications [14], and set k = 1. Therefore, the initial values of and are two tunable variables that can determine whether or not the GSOFM can Gnd a high quality solution in convergence. Using

(11)

Table 3

Eight problems selected from the TSPLIB

Problems Number ofcities Optimal length

bier127 127 118282 eil51 51 426 eil76 76 538 eil101 101 629 pr107 107 44303 pr136 136 96772 rd100 100 7910 st70 70 675 Table 4

Compare the best result ofthe GSOGM with those ofthe SOFM, GN, AVL, KL and KG Problems Optimal GSOFM ((0); ) SOFM GN AVL KL KG

lenght bier127 118282 121181.3 (55,0.92) 122211.7 155163.2 122673.9 121548.7 121923.7 eil51 426 437.71 (36,0.91) 443.9 470.7 443.5 438.2 438.2 eil76 538 562.41 (139,0.74) 571.2 614.3 571.3 564.8 567.5 eil101 629 658.04 (85,0.85) 688.7 771.9 671.4 658.3 664.4 pr107 44303 44483.46 (147,0.87) 44504.3 80481.3 45096.4 44628.3 44491.1 pr136 96772 98956.42 (41,0.78) 103878.0 135887.7 103442.3 101156.8 101752.4 rd100 7910 8143.72 (72,0.69) 8137.9 8731.2 8265.8 8075.7 8117.4 st70 675 692.06 (21, 0.52) 692.8 755.7 693.3 685.2 690.7

eight problems shown in Table 3 from the TSPLIB proposed by Reinelt [23], we compare the best solution by the GSOFM with those ofother neural network models, including the SOFM, the GN, AVL method, the KNIES-TSP (KL) and the KNIES-TSP-Global (KG), all as reported in [3]. Note that cities in each selected problem are spread on the two-dimensional Euclidean space, while the Euclidean norm is used to compute the distance among any two cities. Simulation results are summarized in Table 4.

In Table 4, the Grst column shows the testing problem sets that are used in our simulation. The second column shows the known optimal length for these problem sets. The real numbers in the third to eighth column show the best solutions through various parameter speciGcations obtained by using GSOFM, SOFM, GN, AVL, KL and KG, respectively. In the third column, the numbers in parentheses are parameter speciGcations (i.e., (0) and ) for which the best results can be obtained by the GSOFM.

We Gnd that the GSOFM outperforms the other neural network models for bier127, eil51, eil76, eil101, pr107 and pr136. In the case ofst70, the result ofthe GSOFM is slightly inferior compared with those of KL and KG. Furthermore, to show the relative deviations from the optimal length, we summarize the results in Table 5. From this table, we may thus conclude that the GSOFM outperforms the

(12)

Table 5

Deviation from the optimal length of the various algorithms

Problems GSOFM SOFM GN AVL KL KG

bier127 2.45 3.32 31.18 3.71 2.76 3.08 eil51 2.75 4.20 10.49 4.11 2.86 2.86 eil76 4.54 6.17 14.18 6.19 4.98 5.48 eil101 4.62 9.49 22.72 6.74 4.65 5.63 pr107 0.41 0.45 81.66 1.79 0.73 0.42 pr136 2.26 7.34 40.42 6.89 4.53 5.15 rd100 2.95 2.88 10.38 4.49 2.09 2.62 st70 2.53 2.64 11.96 2.71 1.51 2.33 Average 2.81 4.56 27.87 4.58 3.01 3.45

other neural network models. Actually, Tables 3 and 4 show the feasibility and e:ectiveness ofthe GSOFM for solving the TSP.

5. Discussions and future works

Since the original SOFM in the training phase ignores important information, which is the relationships that actually exist between the input training data and each adjustable output node, we thus incorporate the grey relational coe8cients into the learning rule ofthe SOFM, namely the GSOFM. The GSOFM can be viewed as a grey clustering method. To show the problem-solving capability of the GSOFM, the performances are examined by complete computer simulations for two representative problems: one is the classiGcation problems, including the iris data proposed by Fisher [11], the appendicitis data and the wine recognition data; and the other is the TSP, selecting from the TSPLIB problem set proposed by Reinelt [23].

In the classiGcation problems, we Gnd that the best result from the GSOFM outperforms that of the SOFM in each problem. Moreover, the best result from the GSOFM with respect to the iris data is compared with those ofother known unsupervised neural networks models. Although criteria in selecting a method for classiGcation problems are subjective and dependent on applications, accuracy is always the primary goal [25]. For applying the unsupervised neural networks on classiGcation problems, simulation results thus demonstrate the e:ectiveness and the feasibility of the GSOFM.

As for the TSP, we selected some problem sets from the TSPLIB to test the performance ofthe GSOFM. In our simulations, the number ofoutput nodes are experimentally set to be three times ofthe number ofcities. It is furthermore possible to check ifthe quality ofthe solution depends on the number ofoutput nodes, especially for the large size problems such as pcb442 and att532 in the TSPLIB. In addition, we can modify the GSOFM to dynamically add nodes in the Kohonen layer to obtain good quality during the training phase.

On the other hand, we are also very interested to use the GSOFM to solve problems encountered in other Gelds. For example, we may design a framework to

(13)

integrate the GSOFM with the fuzzy query processing. Previously, Kamel et al. had proposed a clustering method for fuzzy query processing [18] from the viewpoint ofenhancing the Nexibility ofthe existing database systems. Kamel et al.’s works provide a good basis for the future integration. Also, the GSOFM could serve as a data mining tool. Previously, the SOFM has been a powerful tool for data mining that can help a business to analyze the characteristics ofcustomers from transaction databases. Therefore, it is possible to apply the GSOFM for knowledge discovery. For example, a large bank could try to understand customers who currently have home equity loans to determine the best strategy for increasing its market share [4]. 6. Conclusions

By applying the GSOFM on classiGcation problems and on the TSP, we can see that simulation results demonstrate the e:ectiveness and feasibility of the GSOFM, and we will continue to study related topics. From the discussions and the future works mentioned above, we can see that it is worthwhile to measure the e:ective-ness and the feasibility applying the GSOFM to fuzzy query processing and data mining.

Acknowledgements

We are very grateful to the anonymous referees for their valuable comments and constructive suggestions.

References

[1] M. Abdolhamid, S. Samerkae, E. Takao, A self-organizing neural network approach for multiple traveling salesman and vehicle routing problems, Int. Trans. Oper. Res. 6 (6) (1999) 591–606. [2] B. Angeniol, C. Vaubois, J.Y. LeTexier Jr, Self-organizing feature maps and the traveling

salesman problem, Neural Networks 1 (4) (1988) 289–293.

[3] N. Aras, B.J. Oommen, I.K. Altinel, The Kohonen network incorporating explicit statistics and its application to the traveling salesman problem, Neural Networks 12 (9) (1999) 1273–1284. [4] M. Berry, G. Lino:, Data Mining Techniques: For Marketing, Sales, and Customer Support,

Wiley, New York, 1997.

[5] L.I. Burke, P. Damany, The guilty net for the traveling salesman problem, Comput. Oper. Res. 19 (4) (1992) 255–265.

[6] J.C. Bezdek, N.R. Pal, Two soft relatives of learning vector quantization, Neural Networks 8 (5) (1995) 729–743.

[7] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum, New York, 1981.

[8] C.S. Cheng, Y.T. Hsu, C.C. Wu, Grey neural networks, IEICE Transactions on Fundamentals Electronics, Commun. Comput. Sci. E81-A (11) (1998) 2433–2442.

[9] S.T. Cli:ord, W.C. Siu, New approach for solving the travelling salesman problem using self-organizing learning, in: Proceedings of the IEEE International Conference on Neural Networks, Vol. 5, 1995, pp. 2632–2635.

(14)

[11] R.A. Fisher, The use ofmultiple measurements in taxonomic problems, Ann. Eugenics 7 (2) (1936) 179–188.

[12] J. Hertz, A. Krogh, R.G. Palmer, Introduction to the Theory ofNeural Computation, Santa Fe Institute Studies in the Sciences ofComplexity, Addison-Wesley, MA, 1991.

[13] J.J. HopGeld, D.W. Tank, Neural computation ofdecisions in optimization problems, Biolog. Cybernet. 52 (1985) 141–152.

[14] Y.T. Hsu, C.M. Chen, A novel fuzzy logic system based on N-version programming, IEEE Trans. Fuzzy Systems 8 (2) (2000) 155–170.

[15] Y.P. Huang, C.H. Huang, Real-valued genetic algorithms for fuzzy grey prediction system, Fuzzy Sets Systems 87 (3) (1997) 265–276.

[16] T.L. Huntsberger, P. Ajjimarangsee, Parallel self-organizing feature maps for unsupervised pattern recognition, Int. J. ofGeneral Systems 16 (4) (1990) 357–372.

[17] J.S.R. Jang, C.T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence, Prentice-Hall, Englewood Cli:s, NJ, 1997.

[18] M. Kamel, B. HadGeld, M. Ismail, Fuzzy query processing using clustering techniques, Inform. Process. Manage. 26 (2) (1990) 279–293.

[19] T. Kohonen, Self-Organizing Maps, Springer, Berlin, 1995.

[20] N.R. Pal, J.C. Bezdek, E.C.K. Tsao, Generalized clustering networks and Kohonen’s self-organizing scheme, IEEE Trans. Neural Networks 4 (4) (1993) 549–557.

[21] N.R. Pal, J.C. Bezdek, R.J. Hathaway, Sequential competitive learning and the fuzzy c-means clustering algorithms, Neural Networks 9 (5) (1996) 787–796.

[22] A.S. Pandya, R.B. Macy, Pattern Recognition with Neural Networks in C++, CRC Press, Boca Raton, FL, 1996.

[23] G. Reinelt, TSPLIB-a traveling salesman problem library, ORSA J. Comput. 3 (4) (1991) 376–384.

[24] K.Q. Shi, G.W. Wu, Y.P. Hwang, Theory ofGrey Information Relation, Chuan Hwa, Taiwan, 1994.

[25] S. M. Weiss, C. A. Kulikowski, Computer Systems That Learn: ClassiGcation and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, Morgan Kaufmann, CA, 1991.

Yi-Chung Hu received the B.S. degree in information and computer engineer-ing from the Chung Yuan Christian University, Chungli, Taiwan, and the M.S. degree in computer and information science from the National Chiao Tung University, Hsinchu, Taiwan, in 1991 and 1993, respectively. From October 1993 to January 2000, he worked as a research assistant in the Chung-Shan Institute ofScience and Technology, Taoyan, Taiwan. He is currently work-ing toward the Ph.D degree in information management at the National Chiao Tung University.

His primary research interests include soft computing, data mining, and multiple criteria decision making.

Ruey-Shun Chen received the Ph.D degree in computer science and informa-tion engineering from the Nainforma-tional Chiao Tung University, Hsinchu, Taiwan, in 1995, respectively.

He is currently an Associate Professor at the National Chiao Tung Univer-sity. His research interests include genetic algorithms, reliability, performance evaluation, business networks, and distributed systems.

(15)

Yen-Tseng Hsu received the B.S. degree in electrical engineering from the National Taiwan University ofScience and Technology, Taipei, Taiwan, the M.S. degree in electrical engineering from the National Tsing Hwa University, Hsinchu, Taiwan, and the Ph.D degree in electrical engineering from the National Taiwan University, Taipei, Taiwan, in 1985, 1987, and 1991, respectively.

He is currently a Professor at the National Taiwan University of Science and Technology. His research interests include fuzzy logic, neural networks, grey information theory, and VLSI=FPGA

Gwo-Hshiung Tzeng received the B.S. degree in business management from the Tatung University; Taipei, Taiwan, the M.S. degree in urban planning from Chung Hsing University; Taipei, Taiwan, and the Ph.D. degree in management science from Osaka University, Osaka, Japan, in 1967, 1971, and 1977, re-spectively.

He was a Research Associate at Argonne National Laboratory from July 1981 to January 1982, a Visiting Professor in the Department of Civil En-gineering at the University ofMaryland, College Park, from August 1989 to August 1990, a Visiting Professor in the Department of Engineering and Eco-nomic System and Operations Research (EESOR), Energy Modeling Forum at Stanford University, from August 1997 to August 1998, and a Professor at Chaio Tung University from 1981 to the present. He is a member of IEEE, IAEE, ISMCDM, and World Transport. He is currently a National Distinguished Chair Professor at the National Chaio Tung University. His research interests include multivariate analysis, routing and scheduling, multiple criteria decision making, and fuzzy theory.