Text document summarization is critical for managing today’s vast
textual data. This paper presents an approach to text document
summarization that does not rely on word embedding techniques. Instead,
our method follows a step-by-step process, including sentence
segmentation, sentence embedding, K-means clustering, and summary
generation. The input text is segmented into individual sentences using
an NLP tool such as NLTK’s sentence tokenizer. Next, we extract
contextual embeddings for each sentence using the Sentence Transformer
method. These embeddings capture the meaning of each sentence within the
context of the surrounding text. The sentence embeddings are then
subjected to K-means clustering. This step enables the creation of
clusters that represent semantically related sentences. To generate the
summary, depending on how far each sentence is from the cluster
centroid, we choose one sentence from each cluster. The sentence with
the lowest distance from the centroid is chosen, and the selected
sentences are ordered as they appeared in the original text. We
implemented the summarizer and evaluated its performance on the DUC 2007
dataset, a collection of news articles with manually crafted summaries
by human experts. The results demonstrate that our summarizer produces
informative and concise summaries, surpassing a baseline approach that
solely extracts top-ranked sentences from the input text. Our work
contributes to text document summarization by presenting an alternative
approach that does not rely on word embedding techniques. By leveraging
sentence segmentation, contextual embeddings, K-means clustering, and
centroid-based selection, our method offers a viable solution for
generating high-quality summaries. Further research can explore
enhancements to our approach and its application in various domains
where text summarization is essential.