The quantity of information on the internet is massively increasing and gigantic volume of data with numerous compositions accessible openly online become more widespread. It is challenging nowadays for a user to extract the information efficiently and smoothly. As one of the methods to tackle this challenge, text summarization process diminishes the redundant information and retrieves the useful and relevant information from a text document to form a compressed and shorter version which is easy to understand and timesaving while reflecting the main idea of the discussed topic within the document. The approaches of automatic text summarization earn a keen interest within the Text Mining and NLP (Natural Language Processing) communities because it is a laborious job to manually summarize a text document. Mainly there are two types of text summarization, namely extractive based and abstractive based. This paper focuses on the extractive based summarization using K-Means Clustering with TF-IDF (Term Frequency-Inverse Document Frequency) for summarization. The paper also reflects the idea of true K and using that value of K divides the sentences of the input document to present the final summary. Furth more, we have combined the K-means, TF-IDF with the issue of K value and predict the resulting system summary which shows comparatively best results.
In the field of data mining, the approach of assigning a set of items to one similar class called cluster and the process termed as Clustering. Document clustering is one of the rapidly developing, research area for decades and considered a vital task for text mining due to exceptional expansion of document on cyberspace. It provides the opportunity to organize a large amount of scattered text, in meaningful clusters and laydown the foundation for smooth descriptive browsing and navigation systems. One of the more often useable partitioning algorithm is k-means, which is frequently use for text clustering due to its ability of converging to local optimum even though it is for enormous sparse matrix. Its objective is to make the distance of items or data-points belonging to same class as short as possible. This paper, exploring method of how a partitioned (K-mean) clustering works for text document clustering and particularly to explore one of the basic disadvantage of K-mean, which explain the true value of K. The true K value is understandable mostly while automatically selecting the suited value for k is a tough algorithmic problem. The true K exhibits to us how many cluster should make in our dataset but this K is often ambiguous there is no particular answer for this question while many variants for k-means are presented to estimate its value. Beside these variants, range of different probing techniques proposed by multiple researchers to conclude it. The study of this paper will explain how to apply some of these techniques for finding true value of K in a text dataset.
Knowing exact number of clusters in a digital image significantly facilitates in precisely clustering an image. This paper proposes a new technique for extracting exact number of clusters from grey scale images. It analyzes the contents of the input image and adaptively reserves one distinct cluster for one distinct grey value. The total count of the grey values found in an image determines the exact number of clusters. Based on the contents of image, this number of clusters keeps on changing from image to image. After obtaining this number, it is given as an input to Gaussian Mixture Model (GMM) which clusters the image.GMM works with finite number of clusters and forms mixture of various spectral densities contained in that image. The proposed method facilitates GMM to adapt itself according to the changing number of clusters. Therefore, the proposed model along with the inclusion of GMM, is named as Adaptive Finite Gaussian Mixture Model (AFGMM). The clustering performance of AFGMM is evaluated through Mean Squared Error (MSE) and Peak Signal to Noise Ratio (PSNR). Both of these performance measuring methods confirmed that exact number of clusters is essentially important for reliably analyzing an image.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.