2015 IEEE International Conference on Data Mining Workshop (ICDMW) 2015
DOI: 10.1109/icdmw.2015.6
|View full text |Cite
|
Sign up to set email alerts
|

OntoSeg: A Novel Approach to Text Segmentation Using Ontological Similarity

Abstract: Abstract-Text segmentation (TS) aims at dividing long text into coherent segments which reflect the subtopic structure of the text. It is beneficial to many natural language processing tasks, such as Information Retrieval (IR) and document summarisation. Current approaches to text segmentation are similar in that they all use word-frequency metrics to measure the similarity between two regions of text, so that a document is segmented based on the lexical cohesion between its words. Various NLP tasks are now mo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 29 publications
0
10
0
Order By: Relevance
“…Later, supervised methods included topic models (Riedl and Biemann, 2012) by calculating a coherence score using dense topic vectors obtained by LDA. Bayomi et al (2015) exploited ontologies to measure semantic similarity between text blocks. Alemi and Ginsparg (2015) and Naili et al (2017) studied how word embeddings can improve classical segmentation approaches.…”
Section: Related Workmentioning
confidence: 99%
“…Later, supervised methods included topic models (Riedl and Biemann, 2012) by calculating a coherence score using dense topic vectors obtained by LDA. Bayomi et al (2015) exploited ontologies to measure semantic similarity between text blocks. Alemi and Ginsparg (2015) and Naili et al (2017) studied how word embeddings can improve classical segmentation approaches.…”
Section: Related Workmentioning
confidence: 99%
“…This causes generating the segment boundaries with the disadvantage of words being repeated throughout the process of segmentation as reported in the existing works [35]. To overcome this issue, different approaches were also proposed [36, 37], but they do not rely on a training phase or directly applied to text data. However, existing works found comparative results using ontological similarity in conjunction with labelled data, but their document representation assume each text unit as single piece of information as well as their thematic information is being lost, and thus, segments might not be related or labelled with any topical information [38, 39].…”
Section: Related Workmentioning
confidence: 99%
“…segments), which is proclaimed as ‘speaking about’ for a specific topic. However, traditional approaches [3741] find topics underlie on boundaries by default: either they consider topical borderline is to be sketched in the perspective of no man’s land between two distinguished topical areas, or a large variance in the vocabulary could occur. Furthermore, these approaches limit the data to words rather using topics and they are based on similarity or density measure, thus, letting go to discourse and semantic information contained in the text units.…”
Section: Document Representation and Scoringmentioning
confidence: 99%
“…The domains were selected at random and the abstracts were randomly chosen from within the following domains: Accidents, Natural Disasters, Politics, Famous People, Sports, and Animals. Two summaries were generated for each abstract, one summary was generated without applying Anaphora Resolution and the second summary was generated by applying Anaphora Resolution before summarisation 1 . When we tried to generate the two summaries for each article, we noticed in 45 articles that the two summaries were identical, which reflected the possibility that AR may have had no impact on the produced summaries.…”
Section: Datasetmentioning
confidence: 99%
“…Natural Language Processing (NLP) has different tasks [1,2]. One of these tasks is Automatic Text Summarisation (TS) that has been the subject of a lot of interest in the NLP community in recent years [2].…”
Section: Introductionmentioning
confidence: 99%