Irene Cozzolino scite author profile

Irene Cozzolino

1Publication

0Citation Statements Received

57Citation Statements Given

How they've been cited

How they cite others

Affiliations

Sapienza University of Rome

Publications

Order By: Most citations

Document clustering

Cozzolino

Ferraro

2022

WIREs Computational Stats

View full text Add to dashboard Cite

Nowadays, the explosive growth in text data emphasizes the need for developing new and computationally efficient methods and credible theoretical support tailored for analyzing such large-scale data. Given the vast amount of this kind of unstructured data, the majority of it is not classified, hence unsupervised learning techniques show to be useful in this field. Document clustering has proven to be an efficient tool in organizing textual documents and it has been widely applied in different areas from information retrieval to topic modeling. Before introducing the proposals of document clustering algorithms, the principal steps of the whole process, including the mathematical representation of documents and the preprocessing phase, are discussed. Then, the main clustering algorithms used for text data are critically analyzed, considering prototype-based, graph-based, hierarchical, and model-based approaches. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical Learning and Exploratory Methods of the Data Sciences > Text Mining Data: Types and Structure > Text Data K E Y W O R D S document clustering, document representation, graph-based methods, hierarchical methods, model-based methods, prototype-based methods, text data

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Irene Cozzolino

Document clustering

Contact Info

Product

Resources

About