ELSEVIER Information Processing Letters 52 (1994) 259-263
Information
Processing
Letters
A fuzzy clustering algorithm for graph bisection
Jin-Tai Yan *, Pei-Yung
Hsiao
Department of Computer and Information Science, National Chiao Tung Unitiersity, 1001 Ta Hsueh Road, Hsinchu, Taiwan, ROC
Communicated by D. Gries; received 27 August 1993; revised 14 June 1994
Abstract
A fuzzy clustering algorithm based on global connection information is proposed to solve the graph bisection problem.
Keywords: Algorithms; Graph bisection; Fuzzy membership; Fuzzy clustering
1. Introduction
Let G = (V, E) be an undirected connected edge-weighted graph. In general, a partition of G is a partition of its vertex set V. Hence, if the ends of an edge e in E belong to two different subsets of the partition, e will be cut by a parti- tion (V,, V,) of V. The cut of a partition <V,, V,) for graph G is defined as the sum of the weights of all the edges cut by the partition
cUt( v1, vz) = C C ‘ij,
where cij is the weight of the edge {i, j} in G. Therefore, a min-cut partition for graph G is a partition (V,, VJ of I/ with minimum cut. How- ever, a min-cut partition always yields an unbal- anced partition, and an unbalanced partition is inefficient on many applications. Therefore, bal- anced-partition graph bisection is formulated as
* Corresponding author.
follows: A partition (V,, 1/Z] of I/ is said to be a graph bisection (GB) if I VI I = I V2 I when
I I/ I
is even or I V, I = I V, I - 1 when I I/ I is odd.Due to the size constraint on the partition, GB is NP-complete [4]. Many heuristic approaches have been suggested for GB. In 1970, Kernighan and Lin [.5] proposed a two-way “group-migra- tion” improvement algorithm with a constraint on the subset size. They randomly started with two subsets and iteratively applied pairwise swapping on all pairs of nodes. Subsequently, Fiduccia and Mattheyses [3] reduced the time complexity to O(P) with respect to the number of pins P. Saab and Rao [7,8] also proposed heuristic algorithms to solve GB. Generally speaking, the Kernighan- Lin based algorithm [l] is quite efficient, but it does not focus on the global connection informa- tion of the given graph. Therefore, it is difficult for it to obtain an optimal or near-optimal graph bisection.
In this paper, we propose a fuzzy clustering algorithm based on global connection information to solve GB. For an undirected connected edge-
0020.0190/94/$07.00 0 1994 Elsevier Science B.V. All rights reserved
260 J.-T. Yan, P. -Y. Hsiao /Information Processing Letters 52 (1994) 259-263
weighted graph, we introduce two groups of fuzzy
memberships
on the vertex set and define the
clustering distance between any pair of vertices in
the graph according to global connection. Based
on fuzzy c-means clustering [2,61, two-way fuzzy
graph clustering generates
two groups of con-
verged fuzzy memberships for the vertex set. Fi-
nally, according to the grades of the member-
ships, the vertices in the graph can be separated
into two even subsets with minimum cut.
2. Fuzzy
membership on verticesGiven is an undirected
edge-weighted
graph
G =(V, E), where I/= {xi, x2 ,..., XJ and
E = (Yl, Y,, * f * 7 y,). Let [w+the set of nonnegative
reals and iWZn the set of real 2
x nmatrices.
First, fuzzy memberships and fuzzy functions for
the vertex set I/ are introduced.
Every fuzzy
function
ui : Y+ [O,l] assigns grades of fuzzy
memberships onto the vertices in V. Function ui
is called the ith fuzzy set in I/. There are in-
finitely many fuzzy sets associated with I/. Every
fuzzy set in I/ represents a possible fuzzy cluster-
ing. Hence, for two-way partitioning,
two fuzzy
sets in V will be applied to partition the vertex
set
V.In order to partition
Vby means of fuzzy sets,
we need some clustering constraints between the
two fuzzy sets. For example, for each xk in
V,the
sum of the fuzzy memberships in the two fuzzy
sets is restricted
to be 1. Formally, a two-way
fuzzy clustering for two-way partitioning can be
represented
by a fuzzy matrix U in MZn whose
entries satisfy the following clustering constraints:
(1)
(2)
(3)
(4)
(5)
(6)
Row
iof U, say q = (uil, ui2), exhibits fuzzy
set
iof
V.d;=
if {i,
j)is an edge in the graph,
I
-
Short path(
i,j)
if {i,
j}is not an edge in the graph,
Column j of U, say L$ = (uij, uZj) exhibits
the values of the 2 fuzzy sets of the jth datum
in
V.uik
shall be interpreted as uj(x,), the value of
fuzzy set
ifor the kth datum.
The sum of the membership values for each
xk is 1 (L+ + uZk = 1, for all k).
No fuzzy set is empty (row sum &uik > 0, for
all
i).where Short path& t) is the sum of the weights
on the short&t path from vertex s to vertex
t.Clearly, the clustering distance of all pairs of
vertices in the graph must be obtained for fuzzy
clustering. Hence, all clustering distances can be
computed by running an all-pairs shortest-path
algorithm.
4.
Optimality of fuzzy clusteringNo
fuzzy set is all of
V(row sum Ckujk <n,
Based on fuzzy c-means clustering, two-way
for all
il.fuzzy graph clustering can be transformed into a
3.
Clustering distanceDue to the primary min-cut operation in graph
partitioning, it is sure that any pair of connected
vertices with larger weight will be clustered into
the same cluster to reduce the partitioning result.
Hence, for graph partitioning,
the larger the
weight of the edge, the less its clustering distance.
A related clustering graph can be generated by
modifying the edge weights of the original edge-
weighted graph. For G =
(V, E),the related clus-
tering graph G* =
(V*, E*)is an undirected
edge-weighted graph, where
V* = V, E* = E,and
the edge weight c$ of the edge
Ii, j)is defined by
c; = l/Cij.
Since there is no geometrical distance between
any pair of vertices in a graph, it is critical for
fuzzy clustering on a graph structure to estimate
the clustering distance of any pair of vertices.
Simply speaking, for an undirected edge-weighted
graph, the distance of any pair of vertices in the
graph is the distance of the shortest path. Fur-
thermore, the clustering distance of any pair of
vertices in the related clustering graph can be
computed by running a shortest-path
algorithm.
The clustering distance will indicate the clustered
degree of the pair of vertices in the same cluster.
For any pair of vertices
iand j in G*, the
clustering distance d: between vertex
iand j can
be further obtained as
J.-T. Yan, P.-Y. Hsiao / Information Processing Letters 52 (1994) 259-263 261 mathematical optimization problem for the
mapped objective function. Using the fuzzy mem- berships of the vertex set and the clustering dis- tance between any pair of vertices, the objective function for two-way fuzzy graph clustering can be formulated as follows: Let U in I’$,, be a fuzzy graph partition of V, and let u = (u,, u2) be the cluster centers. Objective function Ji : M,, X I/+ R+ is defined as
Ji( u? ui) = 2 ( uik)2( dik)2.
k=l
Further, objective function J : M,, X v2 + IF!+ is defined as
= i: i: (Uik)2(dik)2, k=l i=l
where U in M,, is a fuzzy graph clustering of V, u =(ui, u2) in V2 is the cluster centers, and dik = 11 xk - ui II is the clustering distance between xk and ui. Note the several parameters in the definition of the objective function. The squared clustering distance is weighted by the second power of the membership of datum xk in cluster i. Thus, function J is a squared error criterion, and its minimization produces a fuzzy clustering matrix U that is optimal in a generalized least- squared error sense.
Since two-way fuzzy graph clustering can be transformed into a mathematical optimization problem for the mapped objective function, two- way fuzzy graph clustering can be stated as an approach that attempts to find a solution for the following mathematical program:
Minimize J(U, ‘) = ~ i: (uik)2(dik)2 k=l i=l subject to ulk + U2k = l, uik > 0, 1 Q i G 2, 1 G k G n,
xi E V, 1 G i G n are vertices in the graph, Vj E V, 1 <j =G 2 are unknown cluster centers,
U = {uik) is a 2 X n matrix, where uik is re- ferred to as the grade of membership of xk in row i of matrix U.
Objective function J is a nonlinear multi-vari- able function, and it is difficult for two-way fuzzy graph clustering to obtain an optimal matrix U. For minimizing J, iterative optimization on U and clustering center v can be applied to approx- imate the minima of the function. In the follow- ing lemmas, we discuss necessary conditions on U and v for the mapped objective function.
Lemma 4.1. Consider the following problem:
Minimize
k=l i=l
subject to
Ulk + U2k = 1, l<k<n,
uik > 0, 1 G i G 2, 1 G k Gn, where v is fixed. Then U = {uik) is a mum of the problem if for 1 =G k < n, if xk # ~1, and xk Z u2 then
d,:, *
d:kUik = dfk( di”k + dzk) (for lGiG2 >
global mini-
,
else
i
1 if xk = vi,
“‘= 0 ifxk#vi (for l<i<2).
Proof. By the definition of fuzzy membership, the columns in matrix U are independent. Therefore,
Min{ J( U, v)) = Min i 5 c ( uik)‘( dik)2 k-1 i=l 1 = 5 Min i (uik)2(dik)2 . k=l ( i=l 1
As mentioned in the previous definitions, the restricted condition for each column in U is C~=iuik = 1. Further, the minimum function can be formulated as a function F and solved by the Lagrange Multiplier method,
262 J.-T. Yan, P.-Y Hsiao /Information Processing Letters 52 (1994) 259-263
The first-order sufficient and necessary condi- tions for optimality are
~ = [2(Uik)(dik)2-h] =O. rk By (2), we obtain h Uik = ~ 2(dik)2 ’ Substitute (3) into (1): Therefore, A d,: + d;k -= 2 dfk * dik ’
Substitute (4) into (3), we obtain 1 d:k + d;k d,:, * d;k uik=(din)?* d:k * d;k = dFk(d;k+d;k)’
(1)
(2)
(3)
(4)
The fuzzy membership assignment can be further classified into two different cases. If xk corre- sponds to ui, the fuzzy membership of xk on cluster i is 1 and that on the other cluster is 0. Thus, uik is assigned as follows: for 1 < k <n, if xk # ui and xk # u2 then d:k * d;k U lk = dfk(d;k + d;k) (for 1 <’ G 2), else i 1 if xk = ci,
Uik= 0 if xkfui (forlGiG2). 0 Lemma 4.2. Consider the following problem:
Minimize J(U, L’) = i i: (Uik)2(dik)2 k=l i=l subject to Ulk + U2k = 1, 16 k <n, uik > 0, I G i G 2, 1~ k G n,
where U is fixed. Then v = (v 1, ~1~) is a global
minimum of the problem if vi is in V such that J&U, yi> is the least.
Proof. Due to U being fixed, all rows in matrix U are independent. Therefore,
Min( J(U, u)} = Min{J,(U, v,) +J,(U, u,)} = Min{J,( U, vi))
+ Min{J,(U, v,)}.
Furthermore, the minimization of J(U, U) will depend on the minimization of Min{J,(U, vi)} + Min(J,(U, v2)}. Thus, the center of cluster i for 1 G i G 2 can be assigned by vi such that J,<U, vi) is the least. KI
5. Fuzzy clustering and graph bisection
According to Lemmas 4.1 and 4.2, two-way fuzzy graph clustering, via iterative optimization of J(lJ, v) on U and v, produces a feasible fuzzy graph partition of V= {x,, x2,. . . , x,). The basic steps of the algorithm are as follows:
Algorithm Fuzzy Graph Clustering -
Determine the clustering distance d: between xi and xj, 1 < i,j < n.
Initialize an arbitrary partition and establish a fuzzy matrix U,
Calculate the centers v = (c,, L’*) using U as follows:
(1) Determine ui such that J,(U, v,> is the least,
(2) Determine u2 such that J,(U, v,) is the least.
Calculate a new fuzzy matrix U’ using v = (v,, v,) as follows:
J.-T. Yan, P.-Y Hsiao /Information Processing Letters 52 (1994) 259-263 263
for l<k<n,
if xk ZU, and x,#u, then
d:k *
&
u’k := dfk( dFk + d$) (for l<i<2),
else 1 u’k ‘=
if xk = o’,,
0 if xk # ci (for 1 < i < 2),
5. If I z& - uik I < E, for 1 < i G 2, 1 < k G ~1, then stop; otherwise, U := U’, and repeat at step 3.
After U converges, two groups of fuzzy mem- berships can be generated for all the vertices in the graph. According to the grades of any one group of fuzzy memberships, a vertex ordering will be constructed by sorting the selected group of fuzzy membership, and all the vertices will be separated into two even subsets with minimum cut for graph bisection.
Acknowledgement
References
111
f21
T. Bui, C. Heigham, C. Jones and T. Leighton, Improving the performance of the Kernighan-Lin and simulated an- nealing graph bisection algorithms, in Proc. ACM/IEEE 26th Design Automation Conf (1989) 775-778.
R.L. Cannon, J.V. Dave and J.C. Bezdek, Efficient imple- mentation of the fuzzy c-means clustering algorithms, IEEE Trans. Pattern Analysis Machine Intelligence 8 (1986) 248-255.
[3] C.M. Fiduccia and R.M. Mattheyses, A linear-time heuris- tic for improving network partitions, in Proc. ACM/IEEE 19th Design Automation Conf. (1982) 175-181.
[4] M.R. Garey and D.S. Johnson, Computers and Intractabil- ity (Freeman, San Francisco, CA, 1979).
[5] B.W. Kernighan and S. Lin, An efficient heuristic proce- dure for partitioning graphs, Bell Systems Tech. .I. 49 (1970) 291-307.
[6] T. Kim, J.C. Bezdek and R. J. Hathaway, Optimality tests for fixed points of the fuzzy c-means algorithm, Pattern Recognition 21 (1988) 651-663.
[7] Y.G. Saab and V.B. Rao, Fast effective heuristics for the graph bisectioning problem, IEEE Trans. Computer-Aided Design 9 (1990) 91-98.
[S] Y.G. Saab and V.B. Rao, On the graph bisection problem, IEEE Trans. Circuits Systems I Fund. Theory Appl. 39 (1992) 760-762.
We thank David Gries and the anonymous referees for many helpful suggestions.