The Benchmark of Paragraph and Sentence Extraction Summaries on Outlier Document Filtering Applied Multi-Document Summarizer

Turan, Meti̇n; Sönmez, Coşkun; Ganiz, Murat Can

doi:10.5755/j01.itc.43.4.7010

Cited by 2 publications

(1 citation statement)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Groups of such methods are elaborated in [7,8]. Automatic document summarization [6,20], extractive sentence summarization [66], and a variety of other IR and NLP areas incorporate LMs. Moreover, LMs are often used in many other fields of artificial intelligence, such as machine translation [63], optical character recognition [16] and handwriting recognition [28].…”

Section: A Chronological Overview and Slm Applicationsmentioning

confidence: 99%

Evaluation of Language Models over Croatian Newspaper Texts

Beliga

Ipšić

Martinčić-Ipšić

2017

ITC

View full text Add to dashboard Cite

Statistical language modeling involves techniques and procedures that assign probabilities to word sequences or, said in other words, estimate the regularity of the language. This paper presents basic characteristics of statistical language models, reviews their use in the large set of speech and language applications, explains their formal definition and shows different types of language models. A detailed overview of n-gram and classbased models (as well as their combinations) is given chronologically, by type and complexity of models, and in aspect of their use in different NLP applications for different natural languages. The proposed experimental procedure compares three different types of statistical language models: n-gram models based on words, categorical models based on automatically determined categories and categorical models based on POS tags. In the paper, we propose a language model for contemporary Croatian texts, a procedure how to determine the best n-gram and the optimal number of categories, which leads to significant decrease of language model perplexity, estimated from the Croatian News Agency articles (HINA) corpus. Using different language models estimated from the HINA corpus, we show experimentally that models based on categories contribute to a better description of the natural language than those based on words. These findings of the proposed experiment are applicable, except for Croatian, for similar highly inflectional languages with rich morphology and non-mandatory sentence word order.

show abstract

Section: A Chronological Overview and Slm Applicationsmentioning

confidence: 99%

Evaluation of Language Models over Croatian Newspaper Texts

Beliga

Ipšić

Martinčić-Ipšić

2017

ITC

View full text Add to dashboard Cite

show abstract

Selection Informative Units for Extractive Summarization

Turan

2023

Wseas Transactions on Systems

View full text Add to dashboard Cite

An Extractive Multi-Document Summarizer must select the most informative units and prevents duplication in extraction. In order to achieve this goal, a new technique, called “comprising at least one Representative Term at the Highest Frequency”, called RTHF, is proposed in this work. The units which include representative terms, but with low frequencies are not considered for extraction (selection of the most informative units). On the other hand, these units which provide RTHF feature, precede other similar units in ranking (prevents duplication). The heuristic behind the RTHF is explained by probability. RTHF was experimented on a previously developed and tested paragraph- based Extractive Multi-Document Summarizer. The results show that it enhances the original system by 0.8% ~ 3.2% (Average-F values of ROUGE metrics).

show abstract

The Benchmark of Paragraph and Sentence Extraction Summaries on Outlier Document Filtering Applied Multi-Document Summarizer

Cited by 2 publications

References 22 publications

Evaluation of Language Models over Croatian Newspaper Texts

Evaluation of Language Models over Croatian Newspaper Texts

Selection Informative Units for Extractive Summarization

Contact Info

Product

Resources

About