• 沒有找到結果。

Semantic Based Clustering of Web Documents

N/A
N/A
Protected

Academic year: 2021

Share "Semantic Based Clustering of Web Documents"

Copied!
1
0
0

加載中.... (立即查看全文)

全文

(1)

Semantic Based Clustering of Web Documents

蔣以仁

Tsau Young Lin;I-Jen Chiang

摘要 Abstract

A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed: a primitive concept is represented by a top dimension simplex, and a connected component represents a concept. Based on these structures, documents can be clustered into some meaningful classes.

Experiments with three different data sets from web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and

hierarchical clustering (HAC). This abstract geometric model seems have captured the intrinsic semantics of the documents.

參考文獻

相關文件

"Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values," Data Mining and Knowledge Discovery, Vol. “Density-Based Clustering in

The research proposes a data oriented approach for choosing the type of clustering algorithms and a new cluster validity index for choosing their input parameters.. The

In this paper, we have shown that how to construct complementarity functions for the circular cone complementarity problem, and have proposed four classes of merit func- tions for

Numerical experiments indicate that our alternative reconstruction formulas perform significantly better than the standard scaling function series (1.1) for smooth f and are no

In the past researches, all kinds of the clustering algorithms are proposed for dealing with high dimensional data in large data sets.. Nevertheless, almost all of

Additional Key Words and Phrases: Topic Hierarchy Generation, Text Segment, Hierarchical Clustering, Partitioning, Search-Result Snippet, Text Data

For the data sets used in this thesis we find that F-score performs well when the number of features is large, and for small data the two methods using the gradient of the

• Information retrieval : Implementing and Evaluating Search Engines, by Stefan Büttcher, Charles L.A.