Document Clustering: A Review

Bisht, Sunita; Paul, Aloke

doi:10.5120/12787-0024

Cited by 13 publications

(2 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some detailed document clustering reviews are addressed in Shah and Mahajan (2012) and Premalatha and Natarajan (2010) which consist in describing the general document clustering process and its challenges, focusing mainly on extensions of

K

‐means applied in the context of document clustering and the conventional hierarchical clustering algorithms. Furthermore, in addition to the aforementioned works, Bisht and Paul (2013) analyze also the frequent itemset based clustering approach which consists of a set of techniques that do not require the vector space model representation of the corpus.…”

Section: Introductionmentioning

confidence: 99%

Document clustering

Cozzolino

Ferraro

2022

WIREs Computational Stats

View full text Add to dashboard Cite

Nowadays, the explosive growth in text data emphasizes the need for developing new and computationally efficient methods and credible theoretical support tailored for analyzing such large-scale data. Given the vast amount of this kind of unstructured data, the majority of it is not classified, hence unsupervised learning techniques show to be useful in this field. Document clustering has proven to be an efficient tool in organizing textual documents and it has been widely applied in different areas from information retrieval to topic modeling. Before introducing the proposals of document clustering algorithms, the principal steps of the whole process, including the mathematical representation of documents and the preprocessing phase, are discussed. Then, the main clustering algorithms used for text data are critically analyzed, considering prototype-based, graph-based, hierarchical, and model-based approaches. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical Learning and Exploratory Methods of the Data Sciences > Text Mining Data: Types and Structure > Text Data K E Y W O R D S document clustering, document representation, graph-based methods, hierarchical methods, model-based methods, prototype-based methods, text data

show abstract

K

Section: Introductionmentioning

confidence: 99%

Document clustering

Cozzolino

Ferraro

2022

WIREs Computational Stats

View full text Add to dashboard Cite

show abstract

“… Document Representation: Most of the clustering approaches use the vector space model for document representation. The m*n matrix is represented by the collection of n documents with the m unique words where each document is a vector of m dimension [5].  Document Clustering: At this stage, the target documents are grouped into different clusters by selected features [8].…”

mentioning

confidence: 99%

Document Clustering based on the Similarity of Data with Efficient Time Consumption

Kumar¹

2018

IJCA

View full text Add to dashboard Cite

Text mining has becoming an emerging research area now-adays which helps in extracting the useful information from large amount of natural language text documents. The necessity of grouping the documents for different applications is gaining comprehensive review of the techniques used to improve the efficient time consumption, challenges, research issues are presented. The techniques presented in the review are k-means clustering, fuzzy c means clustering, support vector machine classifiers, naive Bayes classifier, Hidden Markov Model (HMM). Furthermore, discussion of the advantages and disadvantages of each technique is contributed to a better understanding and compared with the existing techniques based on the efficiency and computational time.

show abstract

Document Clustering Using Different Unsupervised Learning Approaches: A Survey

Afreen

Badugu

2019

Learning and Analytics in Intelligent Systems

View full text Add to dashboard Cite

Document Clustering: A Review

Cited by 13 publications

References 15 publications

Document clustering

Document clustering

Document Clustering based on the Similarity of Data with Efficient Time Consumption

Document Clustering Using Different Unsupervised Learning Approaches: A Survey

Contact Info

Product

Resources

About