Multi-document extractive text summarization: A comparative assessment on features

Mutlu, Begüm; Sezer, Ebru Akçapınar; Akçayol, M. Ali

doi:10.1016/j.knosys.2019.07.019

Cited by 42 publications

(15 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[26], [37]. The English language MA is based on the use of finite state machines and finite state transducer [23], [24], [31].…”

Section: B Dictionary-based Approachesmentioning

confidence: 99%

“…The words found in every hundred documents and occurs less often have IDF>2. Almost all topics characterization likelihood, by certain words, has IDF close to 2 [24], [25], [44]. TF-IDF gives maximum value if rare words have many occurrences in the document [26], [29].…”

Section: Text Relevance By Term Frequency-inverse Document Frequenmentioning

confidence: 99%

“…In literature, researchers study this problem of text representation as two strategies: abstraction & extraction. Abstractive or metadata processing technique gives a keywords-based theme about the text as abstractive summarization [13], [24]. Collective metadata of similar documents leads to a relevant set of documents rather than processing documents with individual metadata.…”

Section: Introductionmentioning

confidence: 99%

“…However, still, it has vast application over those documents which contain relatively short unstructured text. It is very much suitable for tasks such as headlines, various documents title, website or research paper keywords [5], [21], sentence compression [13], [19], [24], and sentence fusion [11], [17]. The keyword selection has many concerns and constraints, and as per literature, one cannot guarantee it as stand-alone the comprehensive text representation.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Unstructured Text Documents Summarization With Multi-Stage Clustering

et al. 2020

View full text Add to dashboard Cite

In natural language processing, text summarization is an important application used to extract desired information by reducing large text. Existing studies use keyword-based algorithms for grouping text, which do not give the documents' actual theme. Our proposed dynamic corpus creation mechanism combines metadata with summarized extracted text. The proposed approach analyzes the mesh of multiple unstructured documents and generates a linked set of multiple weighted nodes by applying multistage Clustering. We have generated adjacency graphs to link the clusters of various collections of documents. This approach comprises of ten steps: pre-processing, making multiple corpuses, first stage clustering, creating sub-corpuses, interlinking sub-corpuses, creating page rank keyword dictionary of each sub-corpus, second stage clustering, path creation among clusters of sub-corpuses, text processing by forward and backward propagation for results generation. The outcome of this technique consists of interlinked subcorpuses through clusters. We have applied our approach to a News dataset, and this interlinked corpus processing follows step by step clustering to search the most relevant parts of the corpus with less cost, time, and improve content detection. We have applied six different metadata processing combinations over multiple text queries to compare results during our experimentation. The comparison results of text satisfaction show that Page-Rank keywords give 38% related text, single-stage Clustering gives 46%, twostage Clustering gives 54%, and the proposed technique gives 67% associated text. Furthermore, this approach covers/searches the relevant data with a range of most to less relevant content. It provides the systematic query-relevant corpus processing mechanism, which automatically selects the most relevant subcorpus through dynamic path selection. We used the SHAP model to evaluate the proposed technique, and our evaluation results proved that the proposed mechanism improved text processing. Moreover, combining text summarization features, shown satisfactory results compared to the summaries generated by general models of abstractive & extractive summarization.

show abstract

“…[26], [37]. The English language MA is based on the use of finite state machines and finite state transducer [23], [24], [31].…”

Section: B Dictionary-based Approachesmentioning

confidence: 99%

Section: Text Relevance By Term Frequency-inverse Document Frequenmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Unstructured Text Documents Summarization With Multi-Stage Clustering

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The authors could add features that were relevant to the subject to the document defined by feature set to enhance the classification of the text. The authors [15] explored various forms of terms frequency and topic-related data, and these were considered traits for supporting vector machine. The experimental results on three companies showed that the accuracy of text classification could be improved by combined features.…”

Section: Related Workmentioning

confidence: 99%

A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets

Devi¹,

Kumar²

2020

IJACSA

View full text Add to dashboard Cite

As the size of the document collections are increasing day-by-day, finding an essential document clusters for classification problem is one of the major problem due to high inter and intra document variations. Also, most of the conventional classification models such as SVM, neural network and Bayesian models have high true negative rate and error rate for document classification process. In order to improve the computational efficacy of the traditional document classification models, a hybrid feature extraction-based document cluster approach and classification approaches are developed on the large document sets. In the proposed work, a hybrid glove feature selection model is proposed to improve the contextual similarity of the keywords in the large document corpus. In this work, a hybrid document clustering similarity index is optimized to find the essential key document clusters based on the contextual keywords. Finally, a hybrid document classification model is used to classify the clustered documents on large corpus. Experimental results are conducted on different datasets, it is noted that the proposed document clustering-based classification model has high true positive rate, accuracy and low error rate than the conventional models.

show abstract

Intelligent deep learning‐based hierarchical clustering for unstructured text data

Jyothi

Lingamgunta

Eluri

2022

Concurrency and Computation

View full text Add to dashboard Cite

Document clustering is a technique used to split the collection of textual content into clusters or groups. In modern days, generally, the spectral clustering is utilized in machine learning domain. By using a selection of text mining algorithms, the diverse features of unstructured content is captured for ensuing in rich descriptions. The main aim of this article is to enhance a novel unstructured text data clustering by a developed natural language processing technique. The proposed model will undergo three stages, namely, preprocessing, features extraction, and clustering. Initially, the unstructured data is preprocessed by the techniques such as punctuation and stop word removal, stemming, and tokenization. Then, the features are extracted by the word2vector using continuous Bag of Words model and term frequency-inverse document frequency. Then, unstructured features are performed by the hierarchical clustering using the optimizing the cut-off distance by the improved sensing area-based electric fish optimization (FISA-EFO). Tuned deep neural network is used for improving the clustering model, which is proposed by same algorithm. Thus, the results reveal that the model provides better clustering accuracy than other clustering techniques while handling the unstructured text data. K E Y W O R D S fitness improved sensing area-based electric fish optimization, hierarchical clustering, tuned deep neural network, unstructured text data clustering INTRODUCTIONGenerally, speech and text data are read by humans easily, but the machine learning and statistical modeling applications have some unstructured data and so, it is necessary to do some alterations in the coded input feature sets. 1 Data clustering is a technique used for splitting the data elements into many groups so that the elements in the same group have the highest similarity. Though, based on the cluster's attributes, there are diverse elements in other groups. The major aim of clustering techniques is to get centroids or cluster centers for characterizing the entire cluster. Few of the clustering techniques were performed and classified from different scenarios such as "density-based methods, grid-based methods, partitioning methods, and hierarchical methods." 2,3 Moreover, the data set is defined as categorical or numerical. The primary statistical features of numeric data are utilized for describing the distance function between data elements. The categorical data is imitated from the qualitative and quantitative data, and then the descriptions are attained from the counts. 4 By using a "textual virtual schematic model" (TVSM), the textual data are assigned in clusters and it follows three steps. Initially, the extraction of unstructured data is carried out from the data source, and then, it is changed into structured data. 5 After that, clustering is implemented on structured data. Finally, the comparison of documents is done for enhancing the performance of the query based on accuracy.The day today's life generates a huge amount of unstructured text...

show abstract

Multi-document extractive text summarization: A comparative assessment on features

Cited by 42 publications

References 30 publications

Unstructured Text Documents Summarization With Multi-Stage Clustering

Unstructured Text Documents Summarization With Multi-Stage Clustering

A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets

Intelligent deep learning‐based hierarchical clustering for unstructured text data

Contact Info

Product

Resources

About