2013
DOI: 10.5120/12787-0024
|View full text |Cite
|
Sign up to set email alerts
|

Document Clustering: A Review

Abstract: ABSTRACT:As the internet is exploding with huge volume of text documents, the need of grouping similar documents together for versatile applications have hold the attention of researchers in this area. Document clustering can facilitate the tasks of document organization and web browsing, search engine results, corpus summarization, documents classification, information retrieval and filtering. However several attempts have been made to develop efficient document clustering algorithms but most of the clusterin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…Some detailed document clustering reviews are addressed in Shah and Mahajan (2012) and Premalatha and Natarajan (2010) which consist in describing the general document clustering process and its challenges, focusing mainly on extensions of K‐means applied in the context of document clustering and the conventional hierarchical clustering algorithms. Furthermore, in addition to the aforementioned works, Bisht and Paul (2013) analyze also the frequent itemset based clustering approach which consists of a set of techniques that do not require the vector space model representation of the corpus.…”
Section: Introductionmentioning
confidence: 99%
“…Some detailed document clustering reviews are addressed in Shah and Mahajan (2012) and Premalatha and Natarajan (2010) which consist in describing the general document clustering process and its challenges, focusing mainly on extensions of K‐means applied in the context of document clustering and the conventional hierarchical clustering algorithms. Furthermore, in addition to the aforementioned works, Bisht and Paul (2013) analyze also the frequent itemset based clustering approach which consists of a set of techniques that do not require the vector space model representation of the corpus.…”
Section: Introductionmentioning
confidence: 99%
“… Document Representation: Most of the clustering approaches use the vector space model for document representation. The m*n matrix is represented by the collection of n documents with the m unique words where each document is a vector of m dimension [5].  Document Clustering: At this stage, the target documents are grouped into different clusters by selected features [8].…”
mentioning
confidence: 99%