Text clustering is gaining importance among researchers because of rapid increase in the availability of online text collections without class labels. It helps to organize, summarize and retrieve useful information from corpora. High dimensionality of text datasets leads to poor performance of clustering algorithms. Dimensionality can be reduced using feature extraction or feature selection methods. Feature selection methods scale well and are easy to interpret. An unsupervised univariate filter feature selection method was proposed for dimensionality reduction. The proposed method outperformed nine other filter methods reported in the literature, by identifying most relevant features that lead to good clustering performance on eight popular text datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.