Clustering Documents with Labeled and Unlabeled Documents using Fuzzy Semi-Kmeans

(1)

Fuzzy Sets and Systems 221 (2013) 48 – 64

www.elsevier.com/locate/fss

Clustering documents with labeled and unlabeled documents using

fuzzy semi-Kmeans

Chien-Liang Liu

a,∗

, Tao-Hsing Chang

b

, Hsuan-Hsun Li

c

a_{Information and Communications Research Laboratories, Industrial Technology Research Institute, Rm. 709, Bldg. 51, 195, Sec. 4,}

Chung Hsing Rd., Chutung, Hsinchu 310, Taiwan, ROC

b_{Department of Computer Science and Information Engineering, National Kaohsiung University of Applied Sciences, Chien Kung Campus 415,}

Chien Kung Road, Kaohsiung 807, Taiwan, ROC

c_{Department of Computer Science, National Chiao Tung University, 1001 University Road, Hsinchu 300, Taiwan, ROC} Received 18 February 2012; received in revised form 6 January 2013; accepted 8 January 2013

Available online 16 January 2013

Abstract

While focusing on document clustering, this work presents a fuzzy semi-supervised clustering algorithm called fuzzy semi-Kmeans. The fuzzy semi-Kmeans is an extension of K-means clustering model, and it is inspired by an EM algorithm and a Gaussian mixture model. Additionally, the fuzzy semi-Kmeans provides the flexibility to employ different fuzzy membership functions to measure the distance between data. This work employs Gaussian weighting function to conduct experiments, but cosine similarity function can be used as well. This work conducts experiments on three data sets and compares fuzzy semi-Kmeans with several methods. The experimental results indicate that fuzzy semi-Kmeans can generally outperform the other methods.

Keywords: Fuzzy clustering; Semi-supervised learning; Text mining; Fuzzy semi-Kmeans

1. Introduction

Text clustering has attracted an increasing amount of interest recently. Clustering can be used to automatically group retrieved documents into a list of meaningful categories and it is also one of the most widely used techniques for exploratory data analysis, since it can capture the natural structure of the data. Compared with supervised learning, clustering is an unsupervised learning approach, so it does not need labeled data during the course of clustering and its goal is to assign objects into groups so that objects from the same cluster are more similar to each other than objects from different clusters.

Clustering algorithms can be roughly divided into discriminative and generative types. Discriminative algorithms employ pairwise similarities between every document to determine an objective function and optimize this function to obtain clustering result. The K-means is a typical discriminative algorithm, which aims at the minimization of the average squared distance between the objects and the cluster centers. The K-means is a hard assignment clustering algorithm, where each object belongs to exactly one cluster. However, hard assignment may lead to some problems

for those objects that are located in the boundaries among cluster centers. Fuzzy C-Means (FCM) clustering [1],

which is a soft version of K-means, allows one piece of the object to belong to two or more clusters. Each object has

∗_{Corresponding author. Tel.: +886 3 5913799; fax.: +886 3 5820098.}

E-mail address: [email protected] (C.-L. Liu).

(2)

a membership degree to indicate the degree belonging to each cluster. On the other hand, generative algorithms assume that the data is modeled by underlying parametric distributions, and the objective is to estimate the parameters from observed data. Then, cluster centers can be further obtained from models and their parameters. The Gaussian mixture model is a typical generative algorithm, where a mixture of multiple Gaussian distributions is employed to model the data. A variety of approaches to the problem of mixture decomposition have been proposed, many of which focus on

maximum likelihood methods such as expectation maximization (EM)[2]or maximum a posteriori (MAP) estimation.

The fuzzy semi-Kmeans algorithm proposed in this work is an extension of K-means, but employs EM technique to incorporate a fuzzy membership function to allow the objects to belong to more than one cluster.

Although unsupervised learning approaches do not need labeled data to cluster the documents, proper seeding biases

clustering toward a good region of the search space[3]. Meanwhile, it is very common that the experimenter possesses

some background knowledge that could be useful in clustering the data. Basically, the background knowledge can be encoded as constraints of the clustering, and they should be satisfied when the clustering process is completed. Restated, the semi-supervised clustering problem can be encoded as an optimization problem with constraints.

In general, the objects should be transformed into a collection of feature vectors in advanced of machine learning process. For instance, a spam mail detection application has to transform each email into a term vector to represent email features, and then a classifier can classify each email into spam or non-spam according to the feature vector. This work employs vector space model and bag of words model to represent a document. A document is represented as an unordered collection of words, disregarding grammar and even word order. Clearly, each document is located in a high-dimensional space. One approach to simplification is to employ high-dimensionality reduction technique. This work proposes

to employ probabilistic latent semantic analysis (PLSA) clustering model[4,5]to reduce dimensionality. Essentially,

PLSA, which is inspired and influenced by latent semantic analysis (LSA), aims to analyze the co-occurrences of terms in a corpus of documents to find hidden/latent topics within the corpus. PLSA is a generative model and it is based on a mixture decomposition derived from a latent class model. The reduction process transforms each document from a term vector into a topic vector. Then, fuzzy semi-Kmeans performs semi-supervised clustering on the topic space. The fuzzy semi-Kmeans uses initial labeled examples for seeding. These seeds are used to initialize centers and keep the grouping of labeled data unchanged throughout the clustering process. Essentially, fuzzy semi-Kmeans can employ different fuzzy membership functions to measure the distances between each document and cluster centers. This work employs Gaussian weighting function to measure each document’s class membership, but cosine similarity can function properly as well.

The experimental results indicate that fuzzy semi-Kmeans is stable even though only a small amount of labeled data is available. Meanwhile, fuzzy semi-Kmeans generally outperforms the other semi-supervised learning methods. In many real applications, background knowledge is ready, making it appropriate to employ background knowledge to make the learning more fast and effective. Although unsupervised learning does not need labeled data, the experimental results present that a small amount of labeled data can effectively improve the performance.

The main contribution of this work is that this work proposes a novel fuzzy semi-supervised learning algorithm called fuzzy semi-Kmeans. Moreover, this work proposes to use PLSA clustering model to reduce dimensionality. This work uses three data sets in the experiments and compares proposed fuzzy semi-Kmeans with several state-of-the-art algorithms. The experimental results indicate that fuzzy semi-Kmeans is robust when only a small amount of labeled data is available and it generally outperforms several semi-supervised learning methods.

The rest of this work is organized as follows. Section2presents related surveys. Section3then introduces the fuzzy

semi-Kmeans algorithm. Next, Section4summarizes the results of several experiments. Conclusions are finally drawn

in Section5. 2. Related surveys

Semi-supervised learning, learning with both labeled and unlabeled data, has recently been studied by many

researchers. A variety of semi-supervised algorithms have been proposed, including co-training[6,7], semi-supervised

Naive Bayes[8], Transductive support vector machines (TSVM) [9], fuzzy clustering model[10–13], graph-based

approaches [14,15], and clustering-based approaches [16,3,17]. Semi-supervised learning methods can be further

classified into semi-supervised classification and semi-supervised clustering methods. Semi-supervised classification employs the labeled data along with unlabeled data to construct a more accurate classifier, whereas semi-supervised clustering employs small amount of labeled data to bias the clustering of unlabeled data.

(3)

Basically, unsupervised clustering does not need labeled data during the course of clustering and its goal is to assign objects into groups so that objects from the same cluster are more similar to each other than objects from different clusters. Thus, many clustering algorithms aim at the minimization of the cost function, which involves distortion measure between the objects and the cluster representatives. Intuitively, semi-supervised clustering can become an optimization problem with constraints, since labeled examples can be encoded as constraints of the clustering. This

technique has been widely used by many researchers[10,12,18]. For instance, Bouchachia and Pedrycz[12]developed a

semi-supervised clustering algorithm based on a modified Fuzzy C-Means objective function. In addition to the original FCM objective function, the labeled examples are encoded as an additional regularization term in the complete objective

function. Miyamoto et al.[18]employed the same technique in fuzzy semi-supervised clustering. They introduced two

variants of FCM that regard labeled data as a regularization term of the objective function. Many classification or clustering algorithms rely on a distance measure between patterns to determine the pattern similarities, so defining an appropriate distance measure between patterns is crucial to many machine learning algorithms. Bouchachia and

Pedrycz[19]investigated and quantified the effect of various distance measures on the FCM performance. Essentially,

metric learning and dimensionality reduction are highly related, since learning a Mahalanobis metric is identical to

learning a linear subspace of the data. Huang and Zhang[20]devised locality sensitive clustering algorithms to preserve

locality information in dimensionality reduction.

Moreover, the background knowledge can also be encoded as pairwise constraints of the clustering, and they should

be satisfied when the clustering process is completed. Wagstaff et al.[16]devised a semi-supervised variant of K-means

called COP-KMeans to employ constraints to represent background knowledge. There are two types of constraints, must-link (two instances have to be together in the same cluster) and cannot-link (two instances have to be in different clusters) and they are used in the clustering process to generate a partition that satisfies all the given constraints. Basu

et al.[3]introduced two semi-supervised variants of K-means clustering that use initial labeled data for seeding. These

two algorithms are Seeded-KMeans and Constrained-KMeans. In Seeded-KMeans, the seeds are only used to initialize the K-means algorithm, and they are not used in the clustering algorithm. In Constrained-KMeans, the seeds are used to initialize centers and keep the grouping of labeled data unchanged throughout the clustering process. Their experimental

results showed that Constrained-KMeans outperforms Seeded-KMeans. Zhong’s experimental results[21]supported

the same conclusion. The fuzzy semi-Kmeans proposed in this work is inspired by Constrained-KMeans to use the seeds to initialize centers and keep the grouping of labeled data unchanged throughout the clustering process. However, fuzzy semi-Kmeans further employs EM to perform soft cluster assignment, which can incorporate different fuzzy membership functions into the algorithm; while Constrained-KMeans only allows each object to belong to exactly one cluster.

Besides the above approaches, there are many semi-supervised clustering approaches that are extended from the

other algorithms. For instance, Finley and Joachims[22]presented an SVM algorithm that trains a clustering algorithm

by adapting the item-pair similarity measure. Since all the constraints may not be satisfied, Wang et al.[23]developed

an efficient soft-constraint algorithm to obtain a satisfactory clustering result so that the constraints are respected as

much as possible. In spectral clustering, Ji et al.[24]proposed to incorporate prior knowledge of cluster membership for

document cluster analysis. The prior knowledge indicates pairs of documents that have to be together in the same cluster. Then, the prior knowledge is transformed into a set of constraints. The document clustering task is accomplished by

finding the best cuts of the graph under the constraints. Wang and Davidson[25]proposed a framework for constrained

spectral clustering algorithm, which preserves the original graph Laplacian and explicitly encodes the constraints. 3. Fuzzy semi-Kmeans

3.1. Notation

The notations that will be used in the following sections are described in this section. Given a set of training documentsD = {d1, ... , dN}, the goal is to assign each document into one of the predefined class labels C = {1, ... , C}.

Meanwhile, this work assumes that only small subset of the documents di ∈ Dl are given class labels yi ∈ C, and

the rest of documents, in subset Du, do not contain class label information. Restated, the whole documents can be

divided into two sets, that is, D = Dl ∪ Du. Each document di is considered to be an ordered list of word events,

wi,1, ... , wi,M. This work uses wi, jto denote the wordwjin the document di, wherewjis a word in the vocabulary W = w1, ... , wM. The entry value for wi, jis represented as n(di, wj), meaning the number of timeswj occurring

(4)

in di. P(di) is used to denote the probability that a word occurrence will be observed in a particular document di.

P(wj|zk) represents the class conditional probability of a specific word conditioned on the unobserved class variable

zk, and finally P(zk|di) denotes a document specific probability distribution over the latent variable space. 3.2. Dimensionality reduction

High-dimensional data sets present many mathematical challenges to machine learning tasks. One of the problems with high-dimensional data sets is that not all the measured variables are important for understanding the underlying phenomena of interest. One approach to simplification is to assume that the data of interest lies on an embedded linear subspace or non-linear manifold within the higher-dimensional space. Dimensionality reduction, which tries to find a lower dimensional representation of the data according to some criterion, is an active research field in machine learning. Many dimensionality reduction algorithms have been developed to accomplish these tasks. Principal component analysis (PCA) and multidimensional scaling (MDS) are classical methods that provide a sequence of best linear approximations to a given high-dimensional observation. In order to resolve the problem of dimensionality reduction in nonlinear cases,

many recent techniques, including Isomap[26], locally linear embedding (LLE)[27], and Laplacian eigenmaps[28]

have been proposed.

Hofmann et al.[29]proposed an unsupervised learning framework from dyadic data. The dyadic data refers to a

domain with two sets of objects,X = {x1, ... , xN} and Y = {y1, ... , yM} in which observations are made for (xi, yj) with their co-occurrence information. The dyadic data representation is commonly used in many application domains,

such as text analysis, computer vision, and computational linguistics. In text analysis,X represents a document collection

andY represents the vocabulary set appeared in X . The co-occurrence information (xi, yj) represents the number of

times term yj occurring in document xi. Given the above observations, latent semantic analysis (LSA) is a theory and

method for analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA applies singular value decomposition (SVD) to the term-document matrix and a low-rank approximation of the matrix could be used to determine patterns in the relationships between the terms and

concepts contained in the text. LSA has been successfully applied to various applications[30,31]. Inspired by LSA,

Hofmann[5]proposed PLSA for factor analysis of binary and count data. As an unsupervised learning method, PLSA

does not require labeled data. Additionally, PLSA is a generative model based on a mixture decomposition derived from a latent class model. The latent variable introduced by PLSA can be viewed as topics or concepts embedded in the document collections.

The input of PLSA is a term-document matrix, and PLSA can decompose each document into a set of latent topic

variables. Notably, two PLSA models are the aspect model and statistical clustering model[5,29]. In a clustering model

for documents, PLSA clustering model assumes that each document belongs to exactly one cluster. Conversely, the

aspect model assumes that every occurrence of a word in a document is associated with a unique state zkof the latent

class variable[5]. This work uses PLSA clustering model to reduce dimensionality.

The PLSA clustering model is based on two assumptions: (1) the data is generated by a mixture model and (2) there is a correspondence between mixture components and classes. Under these assumptions, each document is generated

using a mixture model, which is parameterized by. The generating process can be described using two steps: select

a mixture component based on the mixture weights and, then, generate a document based on this selected mixture

component and its parameters. Thus, the likelihood of document di is the sum of total probability over all mixture

components as shown in the following equation:

P(di|) =

K k=1

P(zk)P(di|zk; ) (1)

The number of topics is K and zk represents the kth component. The likelihood ofD is simply the product over all

the documents, since each document is independent of the others, given the model. Then, log likelihood ofD can be

obtained using Eq. (2). The standard procedure for maximum likelihood estimation in latent variable models is the EM

algorithm, which includes E-step and M-step. Similarly, the parameter estimation of PLSA clustering model can be achieved by using the expectation maximization (EM) algorithm.

ln P(D|) = di∈D ln K k=1 P(zk)P(di|zk; ) (2)

(5)

Algorithm 1. PLSA clustering algorithm.

Input: A N× M term-document matrix H, and the number of topics K.

Output: A N× K document-topic distribution matrix Q, where each entry Qi krepresents the probability of document diassigned to topic k.

1 begin 2 3 4 5 6 7 8 9 10 11

Choose K vectors from H randomly and they are the initial values of1, ... ,K

Topic proportion components1= · · · =K = _K1

repeat

E-step: Compute latent variable posterior probability Q

Q= P(zk|di)=kexp M j=1n(di, wj) lnk j .

M-step: Update proportion parameterkandkfor k= 1, ... , K k= P(zk)= N i=1Qi k K k=1 N i=1Qi k . k j= N i=1P(zk|di)n(di, wj) M j=1iN=1P(zk|di)n(di, wj) = N i=1Qi kn(di, wj) M j=1iN=1Qi kn(di, wj) , where j = 1, ... , M

until convergence return Q 12 end

Algorithm1shows the PLSA clustering algorithm. The inputs of the algorithm include term-document matrix H and

the number of topics K. The initial value ofkis determined randomly. The E-step and M-step are estimated according

to the equations listed in Algorithm 1. The output is a document-topic distribution matrix Q. The dimensionality

reduction can be achieved by using the document-topic distribution matrix Q, since each document can be represented by a topic distribution vector.

3.3. Fuzzy semi-Kmeans

Many clustering algorithms aim at the minimization of the cost function, which involves distortion measure between the objects and the cluster representatives. The K-means locally minimizes the average squared distance between the objects and the cluster centers. The Fuzzy C-Means has similar objective function, but it extends K-means to include the degree of membership information, which indicates the confidence in the assignment of the object to the cluster. The above two clustering algorithms can be derived from the optimization of cost functions.

The EM algorithm is another popular technique for analyzing clustering algorithms, since it is a statistically formalized method and it provides more detailed information about the clustering result. As mentioned above, the EM algorithm is a general method of finding maximum likelihood solutions for models having latent variables. If the

set of all observed data is X = {x1, ... , xN}, the set of model parameters is denoted by and the set of all latent

variables is Z, the E-step employs the current parameter valuesoldto estimate the posterior distribution of the latent

variables given by P(Z|X, old_{). Then, the posterior distribution is used to compute the expectation of complete data}

log likelihood to estimate new parameter valuenew. The expectation of complete data log likelihood over the posterior

distribution of latent variables is denoted by Q(, old) as shown in Eq. (3). The M-step estimates new parameter

new_{from the maximization of this function as shown in Eq. (}₄₎

Q(, old)= E_Z_|X,old[ln P(X, Z|)] (3)

new _{= argmax}

Q(,

old

) (4)

The K-means can be modeled using EM algorithm on a mixture of C Gaussians under certain assumptions, where C is the number of clusters of K-means. Whereas the K-means algorithm performs a hard assignment of data points to clusters, in which each data point is associated uniquely with one cluster, the EM algorithm makes a soft assignment based on the posterior probabilities. Consider a Gaussian mixture model with C components in which the means of

these components are1, ... , Cand the common covariance matrices of the mixture components are given by = I,

(6)

complete data log likelihood can be written as[32] E_Z_|X,old[ln P(X, Z|)] = − 1 2 N i=1 C c=1 P(zc|xi; ) xi− c 2+ const (5)

Thus, the maximization of the above expected complete data log likelihood is equivalent to the minimization of the K-means objective function with the hard assignment restriction, that is, each data is assigned to one cluster as shown in the following equation:

P(zc|xi; ) = 1 if c= argmin l x i − l 2 0 otherwise (6)

Essentially, the posterior probability function is not limited to the hard assignment as shown in Eq. (6), the other soft assignments can be used as well. The connection between K-means and EM described above inspires us to propose a fuzzy semi-supervised K-means algorithm. The posterior probability function is replaced by a fuzzy membership function with constraints. Fuzzy membership can then be incorporated into the E-step of EM algorithm. This study

uses a matrix U to keep track of fuzzy membership information, where Ui c denotes the degree of membership of

document i in the cluster c. In other words, Ui cis used to represent the posterior probability P(zc|xi). In M-step, Ui cis

used to estimate new model parameters, which are cluster centers₁, ... , C, from the maximization of log likelihood

function.

Moreover, a small amount of labeled examples is available. The fuzzy semi-Kmeans employs these labeled examples for two main purposes. First, these labeled examples can determine the initial guess of the centers. Practically, clustering algorithms such as K-means and Fuzzy C-Means are sensitive to the initial guess. Thus, the seeds can provide a better guess for the algorithm to cluster the documents. Second, these labeled examples can bias clustering toward a better searching space during the course of clustering.

Algorithm 2. Fuzzy semi-Kmeans algorithm.

Input: A N× M term-document matrix H, the number of topics K, the number of clusters C and the seeds S1, ... , SC. Without loss of generality, Screpresents

the document seeds for cluster c (c= 1, ... , C).

Output: A N× C document-cluster membership matrix U

1 begin 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Hi←− H_i Hi , where Hi

is the i th row of H and i= 1, ... , N ˜ H←− PLSA_Clustering(H, K ) c←− 1 |Sc|

di ∈ScH˜i, wherecis the center of cth cluster and c= 1, ... , C

repeat for i= 1 T o N do for c= 1 T o C do if dii s a document o f Scthen Ui c←− 1 U_{i c}←− 0, for c= 1, ... , C and cc break else |Ui c←− e− ˜Hi −c 2/22 end end

Normalize Ui cso that the sum of each row of U is 1.

end for c= 1 T o C do |Zc←−Ni=1Ui c end for c= 1 T o C do |c←− 1 Zc N i=1Ui c× ˜Hi end

until convergence

return U

(7)

Table 9

Top 10 frequent terms for talk data set.

Ranking Cluster 1 Cluster 2 Cluster 3 Cluster 4

1 tho amid calgari bondag

2 survivalist busload vancouver kej

3 rcander jen sloan bidder

4 guy unintention kellerman now

5 atf cue blackman ccrtkba

6 ncoast bimac criminologist saf

7 assault gideon canada sumpin

8 romp hasan nejm nit

9 oleari polari regimen racket

10 cbnewsh async stricter employers

fuzzy semi-Kmeans with semi-supervised classification methods and semi-supervised clustering methods, the proposed method can generally outperform the other methods.

5. Conclusion

This work focuses on semi-supervised clustering and proposes a novel algorithm called fuzzy semi-Kmeans to perform document clustering with a small amount of labeled documents. This algorithm extends K-means clustering model and uses the seeds to bias clustering toward a good region of the search space. Moreover, fuzzy semi-Kmeans provides the flexibility to employ different fuzzy membership function to measure the distance between data. This work employs Gaussian weighting function to conduct experiments, but cosine similarity function can be used as well. This work conducts experiments on three data sets and compares fuzzy semi-Kmeans with several methods. The experimental results indicate that fuzzy semi-Kmeans can generally outperform the other methods. Even though the boundaries among clusters are not clear, fuzzy semi-Kmeans can function properly. In many real applications, background knowledge is ready, so it is appropriate to employ background knowledge to make the learning more fast and effective.

Acknowledgment

This work was supported in part by the National Science Council under the Grants NSC-101-2221-E-009-163. References

[1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Publishers, Norwell, MA, USA, 1981. [2] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B 39 (1) (1977)

1–38.

[3] S. Basu, A. Banerjee, R.J. Mooney, Semi-supervised clustering by seeding, in: Proceedings of the 19th International Conference on Machine Learning, ICML’02, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002, pp. 27–34.

[4] T. Hofmann, Probabilistic latent semantic analysis, in: Proceedings of Uncertainty in Artificial Intelligence, UAI ’99, 1999. [5] T. Hofmann, Unsupervised learning by probabilistic latent semantic analysis, Mach. Learn. 42 (1–2) (2001) 177–196.

[6] A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in: Proceedings of the 11th Annual Conference on Computational Learning Theory, COLT ’98, ACM, New York, NY, USA, 1998, pp. 92–100.

[7] W. Wang, Z.-H. Zhou, A new analysis of co-training, in: J. Fürnkranz, T. Joachims (Eds.), Proceedings of the 27th International Conference on Machine Learning (ICML-10), Omnipress, Haifa, Israel, 2010, 1135–1142.

[8] K. Nigam, A.K. McCallum, S. Thrun, T. Mitchell, Text classification from labeled and unlabeled documents using EM, Mach. Learn. 39 (2000) 103–134.

[9] T. Joachims, Making large-scale support vector machine learning practical, Advances in Kernel Methods, MIT Press, Cambridge, MA, USA, 1999, pp. 169–184.

[10] W. Pedrycz, J. Waletzky, Fuzzy clustering with partial supervision, IEEE Trans. Syst. Man Cybern., Part B Cybern. 27 (5) (1997) 787–795. [11] A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, Partially supervised clustering for image segmentation, Pattern Recognition 29 (1996)

859–871.

[12] A. Bouchachia, W. Pedrycz, A semi-supervised clustering algorithm for data exploration, in: Proceedings of the 10th International Fuzzy Systems Association World Congress Conference on Fuzzy Sets and Systems, IFSA ’03, Springer-Verlag, Berlin, Heidelberg, 2003, pp. 328–337.

(8)

[13] Y. Hamasuna, Y. Endo, S. Miyamoto, Semi-supervised fuzzy C-means clustering using clusterwise tolerance based pairwise constraints, in: Proceedings of the 2010 IEEE International Conference on Granular Computing, GRC ’10, IEEE Computer Society, Washington, DC, USA, 2010, pp. 188–193.

[14] A. Blum, S. Chawla, Learning from labeled and unlabeled data using graph mincuts, in: Proceedings of the 18th International Conference on Machine Learning, ICML ’01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001, pp. 19–26.

[15] A.B. Goldberg, X. Zhu, Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization, in: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, TextGraphs-1, Association for Computational Linguistics, Stroudsburg, PA, USA, 2006, pp. 45–52.

[16] K. Wagstaff, C. Cardie, S. Rogers, S. Schrödl, Constrained K-means clustering with background knowledge, in: Proceedings of the 18th International Conference on Machine Learning, ICML ’01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001, pp. 577–584. [17] S. Basu, M. Bilenko, R.J. Mooney, A probabilistic framework for semi-supervised clustering, in: Proceedings of the 10th ACM SIGKDD

International Conference on Knowledge Discovery and Data Mining, KDD ’04, ACM, New York, NY, USA, 2004, pp. 59–68.

[18] S. Miyamoto, M. Yamazaki, W. Hashimoto, Fuzzy semi-supervised clustering with target clusters using different additional terms, in: IEEE International Conference on Granular Computing, 2009, GRC ’09, 2009, pp. 444–449.

[19] A. Bouchachia, W. Pedrycz, Enhancement of fuzzy clustering by mechanisms of partial supervision, Fuzzy Sets Syst. 157 (13) (2006) 1733–1759.

[20] P. Huang, D. Zhang, Locality sensitive C-means clustering algorithms, Neurocomputing 73 (16–18) (2010) 2935–2943. [21] S. Zhong, Semi-supervised model-based document clustering: a comparative study, Mach. Learn. 65 (2006) 3–29.

[22] T. Finley, T. Joachims, Supervised clustering with support vector machines, in: Proceedings of the 22nd International Conference on Machine Learning, ICML ’05, ACM, New York, NY, USA, 2005, pp. 217–224.

[23] J. Wang, S. Wu, H.Q. Vu, G. Li, Text document clustering with metric learning, in: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, ACM, New York, NY, USA, 2010, pp. 783–784.

[24] X. Ji, W. Xu, Document clustering with prior knowledge, in: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, ACM, New York, NY, USA, 2006, pp. 405–412.

[25] X. Wang, I. Davidson, Flexible constrained spectral clustering, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, ACM, New York, NY, USA, 2010, pp. 563–572.

[26] J.B. Tenenbaum, V. Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (5500) (2000) 2319–2323.

[27] S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500) (2000) 2323–2326. [28] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. 15 (2003) 1373–1396. [29] T. Hofmann, J. Puzicha, M.I. Jordan, Learning from dyadic data, in: Proceedings of the 1998 Conference on Advances in Neural Information

Processing Systems II, MIT Press, Cambridge, MA, USA, 1999, pp. 466–472.

[30] S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, R.A. Harshman, Indexing by latent semantic analysis, J. Am. Soc. Inf. Sci. 41 (1990) 391–407.

[31] C.-L. Liu, W.-H. Hsaio, C.-H. Lee, G.-C. Lu, E. Jou, Movie rating and review summarization in mobile environment, IEEE Trans. Syst. Man Cybern., Part C 42 (3) (2012) 397–407.

[32] C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag New York Inc., Secaucus, NJ, USA, 2006.

[33] X. Shi, B.L. Tseng, L.A. Adamic, Information diffusion in computer science citation networks, in: E. Adar, M. Hurst, T. Finin, N.S. Glance, N. Nicolov, B.L. Tseng (Eds.), International Conference on Weblogs and Social Media, The AAAI Press, 2009.

[34] C.D. Manning, P. Raghavan, H. Schtze, Introduction to Information Retrieval, Cambridge University Press, New York, NY, USA, 2008. [35] M. Sokolova, L. Guy, A systematic analysis of performance measures for classification tasks, Inf. Process. Manage. 45 (2009) 427–437. [36] A.B. Goldberg, X. Zhu, Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization,

in: TextGraphs ’06: Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing on the First Workshop on Graph Based Methods for Natural Language Processing, Association for Computational Linguistics, Morristown, NJ, USA, 2006, pp. 45–52.

[37] C.-C. Chang, C.-J. Lin, LIBSVM: A Library for Support Vector Machines, Software Available athttp://www.csie.ntu.edu.tw/cjlin/libsvm, 2001.