Fuzzy discriminant analysis with outlier detection by genetic algorithm

(1)

Computers & Operations Research 31 (2004) 877–888

www.elsevier.com/locate/dsw

Fuzzy discriminant analysis with outlier detection by

genetic algorithm

Chang-Chun Lin

a;∗

_{, An-Pin Chen}

b

a_{Department of Information Management, Kun-Shan University of Technology, 949. Da-Wan Road, Yung-Kang,} Tainan 710, Taiwan, ROC

b_{Institute of Information Management, National Chiao-Tung University, Hsinchu 300, Taiwan, ROC}

Abstract

This paper proposes a method for performing fuzzy multiple discriminant analysis on groups of crisp data and determining the membership function of each group by minimizing the classi.cation error using a genetic algorithm. Euclidean distance is used to measure the similarity between data points and de.ning membership functions. A numerical example is provided for illustration. The numerical example indicates that the classi.cation obtained by fuzzy discriminant analysis is more satisfactory than that obtained by crisp discriminant analysis and is less fuzzy than that obtained by fuzzy cluster analysis. Moreover, the proposed fuzzy discriminant analysis is also a good approach to identifying outliers, of which the degree of membership to each group is zero.

Keywords: Fuzzy sets; Fuzzy discriminant analysis; Genetic algorithms

1. Introduction

In multivariate analysis, interests often focus on accounting for the variation in one variable, the criterion or dependent variable, in terms of covariation with other variables, the predictor or indepen-dent variables. Methods for this purpose are classi.ed as the depenindepen-dent methods, and discriminant analysis is an important one. Conventionally, discriminant analysis is divided into two-group dis-criminant analysis and multiple disdis-criminant analysis. Two-group disdis-criminant analysis is generally

conducted using Fisher’s method [1]. The basic idea behind Fisher’s method is to reduce a large set

∗_{Corresponding author. Tel./fax: +886-6-273-2726.}

E-mail address: [email protected](C.-C. Lin).

(2)

of multiple measurements to a single linear composite with values that can maximally distinguish between members of the two groups. In doing so, a set of multivariate data is transformed into a set of univariate data. The linear composite is called Fisher’s discriminant function, and the univariate data are called the discriminant score, which is a projection of each point onto the discriminant axis.

Fisher’s method can also be applied to multiple discriminant analysis. Based on Fisher’s method, the objective of multiple discriminant analysis is to .nd linear composites of the original variables that maximize the ratio of among group variability to within group variability. This concern is one of the most important issues in multiple discriminant analysis. Notably, the number of discriminant axes, depending on the number of groups to be discriminated, will exceed one, and multiple discriminant scores will be then required.

Another approach to developing two-group or multiple discrimination is based on the distances

between data points and centroids of groups. Consider, for example, Mahalanobis’ D2 _{method [}₂_].

This method computes the distances from each individual point to the centroid of each group using the pooled within-group covariance matrix. Each point is then assigned to the group to which it is nearest. Clearly, other types of distance may also be used as discriminating tools. However, both Fisher’s method and the distance method crisply discriminate data; that is, each point can only belong to a single group. Once two groups overlap each other, some points will be assigned to the other group rather than the one to which they were originally assigned. Besides, the closeness of a point to a group is unobtainable.

According to the idea behind fuzzy cluster analysis, originally proposed by Bezdek [3], data points

do not necessarily belong only to one group. A point that belongs to one group may simultaneously belong to other groups with diFerent membership degrees. This idea has been widely accepted by researchers and practitioners. As an extension of classical cluster analysis, fuzzy cluster analysis divides groups based on the similarity between the points in them, which is measured by distance. Notably, however, discriminant analysis diFers from cluster analysis. Cluster analysis is an inde-pendent method applied when no criterion variable exists; that is, when the memberships of data points to each group are unknown beforehand. Instead, when the criterion variable exists, dependent methods such as discriminant analysis are usually used to classify data groups. Discriminant analy-sis distinguishes groups by minimizing the within group variability while maximizing the between group variability; that is, minimizing the error of classi.cation. However, few methods for fuzzy

discriminant analysis have been reported. Although Watada et al. [4] proposed a method for fuzzy

discriminant analysis. Their proposed method is designed for discriminating among objects that are measured by fuzzy predictor variables and a fuzzy criterion variable. This method is simply not suitable for cases that involve crisp predictor variables and a crisp criterion variable.

This study paper presents a method for fuzzy discriminant analysis (FDA). The objective of FDA herein is to determine the membership functions of data groups, which functions can maximally, fuzzily distinguish the data groups. Accordingly, Fisher’s method is inappropriate for situations that involve multiple discriminant variables and cannot conveniently de.ne membership functions. For

simplicity, Euclidean distance instead of Mahalanobis’ D2 _{is used as a measure of the similarity}

between data points. The degree to which a point belongs to a group can be obtained from the membership function that is obtained by minimizing the sum of squared errors in the classi.cation. Consequently, a point can be considered to be an outlier if its degree of membership to each group is zero.

(3)

2. Fuzzy discriminant analysis

Suppose m groups of n points are to be distinguished. The membership degree that point j belongs

to group i, determined from the group’s membership function i, should be as close to the known

membership value yij as possible, while retaining a speci.c degree of fuzziness. The diFerence

between the membership degree and the known membership value is de.ned as the classi.cation error at point j. Thus, the objective of FDA is de.ned herein to be to minimize the sum of squared classi.cation errors for each group. That is,

minimize Ei=

n

j=1

(i(D2ij) − yij)2 for i = 1; 2; : : : ; m; (1)

where yij is the criterion variable, yij∈ {0; 1}. yij = 1 if point j originally belongs to group i;

otherwise yij= 0. D2ij denotes the distance between point j and the centroid of group i. The distance

from a point xj to the centroid of group i, ci, is de.ned as

D2

ij= (xj− ci)G(xj− ci); (2)

where matrix G is symmetric and positive-de.nite. G is an identity matrix when Euclidean distance is used or is the inverse of the pooled within-group covariance matrix of the data when Mahalanobis’

D2 _{is used. Mahalanobis’ D}2 _{is a generalization of the familiar Euclidean distance.}

The simplest and most extensively used fuzzy sets, the ramp-type and Z-shaped fuzzy sets, are used to represent the membership function of each group. The membership function of a Z-shaped fuzzy set ˜A is usually de.ned as

_˜A(x) =          1; if x 6 a; b − x b − a if a 6 x 6 b; 0 if b 6 x: (3) Thus, the membership function of group i can be expressed as

i(D2i) =            1; if D2 i 6 ai; bi− D2i bi− ai if ai¡ D 2 i 6 bi; 0 if bi¡ D2i: (4) If D2

ij6 ai, then point j is considered to belong fully to group i, while if bi¡ Dij2, point j is

considered not to belong to group i. Furthermore, if ai¡ D2ij6 bi, point j is considered to belong

partially to group i with a degree of membership between 0 and 1. Finally, for a point j, i(D2ij) = 0

for i = 1; 2; : : : ; m, then point j does not belong to any group and can be considered to be an outlier. Restated, the objective of FDA is to .nd each group a fuzzy boundary that surrounds the group,

with a fuzzy radius ˜Ri = [ai; bi] from the centroid ci. The region between ai and bi represents the

fuzzy boundary around group i. The ultimate objective of FDA is to .nd ai and bi that can maximally

(4)

Fig. 1. Membership function for data group i.

Fig. 2. Typical contour diagram of objective function Ei(ai; bi).

Fig. 1depicts the membership function of group i. The data points that belong to group i are those

with criterion variable yi= 1. Often, group i overlaps other groups. In such a case, the maximum

distance from the points that belong to group i to the centroid may exceed some of the distances

from the points that not belong to group i to the centroid, resulting in an overlapping area in Fig. 1.

FDA then seeks to .nd the optimal ai and bi for group i that can minimize the classi.cation errors,

especially those that occur in the overlapping area.

The membership function expressed in Eq. (4) can be replaced by the following one:

i(D2i) =1₂ |D 2 i − bi| bi− ai − 1 2 |D2 i − ai| bi− ai + 1 2: (5)

Thus, from Eq. (1), the objective function associated with group i can be expressed as

Ei(ai; bi) = n j=1 1 2 |D2 ij− bi| bi− ai − 1 2 |D2 ij− ai| bi− ai + 1 2 − yij ₂ : (6)

(5)

Obviously, this is a bivariate, continuous but not diFerentiable function, and the optimization problem

is a nonlinear one. Fig.2shows a typical contour diagram of Ei(ai; bi), for example, the error function

de.ned for the membership function in Fig. 1, under the condition ai¡ bi. Fig. 2 shows a valley in

the direction d = (−1; 1). However, the error function may have several sub-optimal points where the gradient is zero and sometimes may be multi-modal. In such a case, fuzzy discriminant analysis becomes a global optimization problem, and genetic algorithms are an eFective approach to .nding the global optimal solution.

3. Genetic algorithm

Many optimization problems are very complex and quite hard to solve by conventional optimiza-tion techniques. Recently, genetic algorithms have received considerable attenoptimiza-tion for their potential in optimization technique for complex problems and have been successfully applied in many

ar-eas. Originally proposed and studied by John Holland [5], genetic algorithms are stochastic search

techniques based on the mechanism of natural selection and genetics. Goldberg [6] described the

typical form of genetic algorithms. Genetic algorithms start with an initial set of random tions called population. Each individual in the population is a chromosome, and represents a solu-tion to the problem to be optimized. A chromosome is a string of symbols that can be decoded into a solution. The chromosomes evolve through successive iterations called generations. During each generation, chromosomes are evaluated according to some measure of .tness, which is re-lated to the objective function of the problem. Fitter chromosomes are more likely to be selected. Two operators are normally used to produce new chromosomes, called oFspring, to form a new generation. One is the crossover operator, which merges two chromosomes selected from the cur-rent generation to form two new chromosomes. The other is the mutation operator, which modi-.es some genes of an old chromosome to generate a new chromosome. After several generations, the algorithms converge to the best chromosome, which is hopefully the optimal solution to the problem.

The genetic algorithm used in this study employs a diFerent evolution process from conventional genetic algorithms. In Holland’s original genetic algorithm, all parents are replaced by their oFspring to form a new generation. This is called generational replacement. OFspring may be less .t than their parents because genetic algorithms are blind, such that some .tter chromosomes may be lost from the evolutionary process. Several replacement processes have been examined to solve this problem. Holland suggested that when each oFspring is born, it replaces an arbitrarily selected chromosome

of the current population. DeJong [7] proposed a crowding strategy that selects one parent to die

when an oFspring is born. The dying parent is the one that most closely resembles the new oFspring,

as measured by a simple bit-by-bit similarity count. In Gen and Cheng [8], both parents and their

immediate oFspring are all candidates for the new generation. The genetic algorithm used in this study follows a similar strategy to prevent the populations from degenerating. Each time an oFspring is produced and is .tter better than the worst parent, a randomly chosen parent, but not the best one, is replaced. Consequently, the population evolves continually without being replaced by another population of the same size. Let P(t) be the population at iteration t. The genetic algorithm is

(6)

described as follows. begin

t ← 0;

initialize P(t); evaluate P(t);

.nd the best and the worst chromosome of P(t); while t ¡ a predetermined number of iteration do

select two parents from P(t);

generate two oFspring by crossover and mutation; evaluate the oFspring;

if the oFspring is .tter than the worst parent then

select a parent except excluding the best one and replace it; .nd the best and the worst chromosome of P(t);

t ← t + 1; end

end

Chromosome encoding is an essential part of every genetic algorithm. The genetic algorithm herein uses decimal encoding rather than binary coding. Decimal encoding can largely reduce the computation load of decoding the genotype into the phenotype. Each chromosome has a length of 12 genes. Each gene is a digit from 0 to 9. The .rst six digits and the last six digits each represent a real number of six signi.cant .gures, including 4 decimal places. The .rst real number is mapped

onto a predetermined domain of ai, and the second real number is mapped onto a domain of bi. The

domain of ai is set to [−k; 99:9999−k] where k is a positive number since ai may be negative. The

domain of bi is set to [0,99.9999] because bi is not likely to be negative. The genetic algorithm in

this study uses roulette wheel selection and one point crossover. The population size is set to 100.

The crossover rate is set to 0.9 and the mutation rate to 0.05. Goldberg [6] or Gen and Cheng [8]

detailed the algorithms for these operations. 4. Numerical example

Consider the data in Table 1, also depicted in Fig. 3. Fig. 3 shows that groups 1 and 2 do not

overlap. Clearly dividing the two groups is impossible. Table 2 lists the primitive results of FDA,

after the membership function for each group is obtained by applying the genetic algorithm described

in Section 3. One point, point 12, is identi.ed as an outlier because its degree of membership to

every group is zero. Fig.3 also indicates that point 12 is clearly an outlier. Outliers must be removed

from the data to keep them from inMuencing the results of FDA. Notably, the membership functions must be determined again after the outliers are removed because this removal alters the centroid of the group. After the outlier is removed from the data, the FDA process is again performed on group

2. Table 4 compares the results of FDA and the results of crisp discrimination that is based on a

comparison of distance.

Notably, point 6, originally belonging to group 1, is assigned to group 2 according to the distance method. According to FDA, the degrees of membership of point 6 to groups 1 and 2 are 0.36 and 1,

(7)

Table 1

Data of the numerical example

No. of sample y1 y2 y3 x1 x2 1 1 0 0 2 4 2 1 0 0 3 2 3 1 0 0 4 5 4 1 0 0 5 3 5 1 0 0 5 4 6 1 0 0 7 5 7 0 1 0 6 5 8 0 1 0 7 6 9 0 1 0 8 4 10 0 1 0 9 7 11 0 1 0 10 6 12 0 1 0 12 10 13 0 0 1 3 8 14 0 0 1 3 9 15 0 0 1 4 8 16 0 0 1 5 6 17 0 0 1 6 7 18 0 0 1 6 8

Fig. 3. Scatter plots of the numerical example.

respectively. These results seem more reasonable than those obtained by the distance method. Fig.

3 also shows that point 6 is closer to group 2 than to group 1. This result coincides with the results

that the degree of membership of point 6 to group 2 exceeds its degree of membership to group 1.

The results for point 7 in Table 4 diFer considerably from the results in Table 3. This point that

originally belongs to group 2 is more likely to belong to group 1 than to group 2 before the outlier is removed. The degree of membership of point 7 to group 1 exceeds that to group 2 because the centroid of group 2 is farther from the point than that the centroid of group 1. After the outlier is removed, group 2 becomes more compact, and its centroid becomes closer to point 7. Thus,

(8)

Table 2

Primitive results of fuzzy discriminant analysis

No. of sample 1j 2j 3j Sum of

membership degrees 1 0.55 0.00 0.00 0.55 2 0.57 0.00 0.00 0.57 3 0.92 0.00 0.00 0.92 4 0.97 0.00 0.00 0.97 5 1.00 0.00 0.00 1.00 6 0.36 0.65 0.00 1.01 7 0.65 0.32 0.00 0.97 8 0.19 0.82 0.00 1.01 9 0.11 0.54 0.00 0.65 10 0.00 1.00 0.00 1.00 11 0.00 0.95 0.00 0.95 12 0.00 0.00 0.00 0.00 13 0.00 0.00 1.00 1.00 14 0.00 0.00 1.00 1.00 15 0.00 0.00 1.00 1.00 16 0.57 0.05 1.00 1.62 17 0.14 0.41 1.00 1.55 18 0.00 0.26 1.00 1.26 Table 3

Parameters of membership functions

No. of group ai bi

1 0.9825 4.0000

2 2.0911 2.1503

3 2.3999 2.7099

the degree of membership of point 7 to group 2 becomes larger than its degree of membership to group 1.

The sum of the degrees of membership of a point to all the groups speci.es the position of that particular point in relation to other points. The sum of membership degrees need not be unity because were that the case, the FDA method would be unable to identify outliers. Thus, a point is considered an outlier if the sum of its degrees of membership to all groups is 0. A higher sum of degrees of membership implies that the point is closer to the centroid of all points, and is more likely to belong to more than one group simultaneously.

Table 2 presents the parameters of each membership function. Based on these parameters, the

(9)

Table 4

Discriminant results by comparing distance and fuzzy discriminant analysis

No. of sample Distance comparison Fuzzy discriminant analysis y 1 y2 y3 1j 2j 3j Sum of membership degrees 1 1 0 0 0.55 0.00 0.00 0.55 2 1 0 0 0.57 0.00 0.00 0.57 3 1 0 0 0.92 0.00 0.00 0.92 4 1 0 0 0.97 0.00 0.00 0.97 5 1 0 0 1.00 0.00 0.00 1.00 6 0 1 0 0.36 1.00 0.00 1.36 7 1 0 0 0.65 1.00 0.00 1.65 8 0 1 0 0.19 1.00 0.00 1.19 9 0 1 0 0.11 1.00 0.00 1.11 10 0 1 0 0.00 1.00 0.00 1.00 11 0 1 0 0.00 1.00 0.00 1.00 12 0 1 0 0.00 0.00 0.00 0.00 13 0 0 1 0.00 0.00 1.00 1.00 14 0 0 1 0.00 0.00 1.00 1.00 15 0 0 1 0.00 0.00 1.00 1.00 16 0 0 1 0.57 0.00 1.00 1.57 17 0 0 1 0.14 0.00 1.00 1.14 18 0 0 1 0.00 0.00 1.00 1.00 group 1 is 1(D21) =        1; if D2 16 0:98; 4:00−D2 1 4:00−0:98 if 0:98 ¡ D126 4:00; 0 if 4:00 ¡ D2 1: (7) In fact, the fuzzy boundary around group 3 is an empty zone. All points are either within or outside the fuzzy boundary. The minimal distance from the centroid of group 3 to the points that do not belong to group 3 is 2.713 while the maximal distance to the points that do belong to group

3 is 2.007. Consequently, group 3 does not overlap any other group. Notably, parameters a3 and

b3 obtained by the genetic algorithm are 2.3999 and 2.7099, respectively, rather than 2.007 and

2.713 because the search of driven by the genetic algorithms tends to stall at an optimal point with

zero gradient. Nevertheless, the value of the objective function is still the optimal value, 0. Table 3

also indicates that the results for group 3 obtained by fuzzy discriminant analysis are identical to those obtained by crisp discriminant analysis. These analytical results indicate that the proposed FDA method degenerates to a crisp discriminant analysis method, provided that the target group does not overlap the other groups.

Fuzzy c-means is still the most frequently used alternative to fuzzy discriminant analysis to clas-sify fuzzily data groups, even when the memberships of samples are provided, because no method of fuzzy discriminant analysis for crisp data has yet been found in the literature. The sum of

(10)

Table 5

Clustering results by fuzzy c-means

No. of sample 1j 2j 3j Original group Classi.cation

1 0.73 0.10 0.17 1 1 2 0.80 0.10 0.10 1 1 3 0.78 0.09 0.13 1 1 4 0.88 0.08 0.04 1 1 5 0.89 0.07 0.04 1 1 6 0.09 0.87 0.05 1 2 7 0.35 0.52 0.13 2 2 8 0.04 0.92 0.04 2 2 9 0.16 0.77 0.07 2 2 10 0.08 0.82 0.10 2 2 11 0.11 0.79 0.10 2 2 13 0.03 0.02 0.94 3 3 14 0.05 0.05 0.90 3 3 15 0.00 0.00 1.00 3 3 16 0.35 0.27 0.37 3 3 17 0.15 0.47 0.38 3 2 18 0.12 0.32 0.56 3 3

memberships of a particular sample to all the groups in fuzzy c-means is constrained to be unity. These constrained memberships cause a serious problem that the performance is inadequate when the data is contaminated by noise and outliers. Robust fuzzy clustering methods have been proposed

to overcome the problem of outliers, such as those of Ohashi [9], Dave [10,11], Krishnapuram and

Keller [12], Frigui and Krishnapuram [13], and MOenard [14]. A review of robust clustering methods

is given by Dave and Krishnapuram [15]. These methods depend on diFerent criteria to determine

whether a point is an outlier or not. DiFerent criteria often lead to diFerent outliers such that to compare the capabilities of detecting outliers of diFerent methods is sometimes diPcult. However, the outlier problem in fuzzy clustering is diFerent from that in FDA because of the existence of the criterion variable. In a clustering problem where the number of clusters is unclear, outliers may either form another cluster or simply be regarded as outliers. But in a discriminant problem where the number of clusters is .xed, outliers that do not belong to any cluster can only be outliers. Thus, comparing the capabilities of detecting outliers between fuzzy clustering methods and FDA is not very appropriate.

Table 5 presents the results obtained by fuzzy c-means after the outlier, point 12, is removed

from the data set. The purpose of so doing is to see what results will be obtained if fuzzy c-means is used instead of FDA. Generally, the results obtained by fuzzy c-means are reasonably consistent with those obtained by FDA, except for the results for points 6 and 17. It is understandable to classify point 6 to group 2, but it seems inappropriate to classify point 17 to group 2. Thus, the results obtained by FDA are comparatively more reasonable than those by fuzzy c-means. On the

other hand, Table 3 also shows that the results obtained by fuzzy c-means are fuzzier than those

obtained by FDA. This .nding can be veri.ed using entropy as a measure of fuzziness. Entropy

(11)

Partition entropy is a scalar de.ned as H( ˜U; m) = −1_nn j=1 m i=1 ijln(ij); (8)

where ˜U is a fuzzy partition matrix. Larger partition entropy corresponds to fuzzier partition. The

partition entropy of fuzzy discriminant analysis is 0.1413 while the partition entropy of fuzzy c-means is 0.5754. The larger partition entropy obtained by fuzzy c-means indicates that the results obtained using fuzzy c-means are fuzzier than those obtained using fuzzy discriminant analysis. Less fuzzy partition results provide more information about the structure of clusters than do fuzzier partition results. Consequently, FDA outperforms fuzzy c-means provided that the criterion variable values of the data points are known beforehand since the former provide less fuzzy partition results.

5. Conclusions

This study proposed a novel approach to fuzzy discriminant analysis for groups of crisp data. The groups of data may overlap. Fuzzy discriminant analysis considers that a point that lies in an overlapping area and is classi.ed as belonging to one group may also belong to another group, to a greater or lesser extent. The extent, to which a point belongs to a certain group, or its membership degree, is unobtainable by conventional crisp discriminant analysis but obtainable by fuzzy discrimi-nant analysis. However, minimizing the classi.cation error is diPcult because the objective function is complicated nonlinear. This study shows that genetic algorithms can represent a good approach to solving this problem. The numerical example shows that the proposed fuzzy discriminant method can determine the membership function for each data group when it overlaps another group. However, when a group does not overlap any other group, the results of discrimination are identical to those obtained by crisp discriminant analysis.

Comparing the partition results obtained by fuzzy discriminant analysis to those obtained by fuzzy c-means reveals that fuzzy discriminant analysis provides clearer structural information concerning the clusters than does fuzzy c-means. Additionally, another advantage of fuzzy discriminant analysis is its capability of identifying outliers. When a point is suPciently far from the others such that its degree of membership to each group is zero, the point is then regarded as an outlier. Notably, the membership functions must be determined again after the outliers are removed from a group. Crisp discriminant analysis is an important dependent method for multivariate analysis and has numerous applications. All the existing applications of crisp discriminant analysis can be potential applications of fuzzy discriminant analysis, which needs further investigation.

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive comments that help to improve the presentation of this paper.

(12)

References

[1] Fisher RA. The use of multiple measurement in taxonomic problems. Annals of Eugenics 1936;7:179–88.

[2] Mahalanobis PC. On the generalized distance in statistics. Proceedings of the National Institute of Science, Calcutta, 1936;12:49–55.

[3] Bezdek JC. Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press, 1981.

[4] Watada J, Tanaka H, Asai K. Fuzzy discriminant analysis in fuzzy groups. Fuzzy Sets and Systems 1986;19: 261–71.

[5] Holland JH. Adaptation in natural and arti.cial systems. Ann Arbor: University of Michigan Press, 1975.

[6] Goldberg DE. Genetic algorithms in search, optimization and machine learning. Reading, MA: Addison-Wesley, 1989.

[7] DeJong K. An analysis of the behavior of a class of genetic adaptive systems. Ph.D. thesis, University of Michigan, USA, 1975.

[8] Gen M, Cheng R. Genetic algorithms and engineering design. New York: Wiley, 1997.

[9] Ohashi Y. Fuzzy clustering and robust estimation. In: Ninth Meeting of SAS Users Group International, Florida, 1984.

[10] Dave RN. Characterization and detection of noise in clustering. Pattern Recognition Letters 1991;12:657–64. [11] Dave RN. Robust fuzzy clustering algorithms. In: Proceeding of the Second International Conference on Fuzzy

Systems, San Francisco, 1993; p. 1281–6.

[12] Krishnapuram R, Keller JM. A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems 1993;1: 98–110.

[13] Frigui H, Krishnapuram R. A robust algorithm for automatic extraction of an unknown number of clusters from noisy data. Pattern Recognition Letters 1996;17:1223–32.

[14] MOenard M. Fuzzy clustering and switching regression models using ambiguity and distance rejects. Fuzzy Sets and Systems 2001;122:363–99.

[15] Dave RN, Krishnapuram R. Robust clustering methods: a uni.ed view. IEEE Transactions on Fuzzy Systems 1993;5:270–93.