Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics 2016
DOI: 10.18653/v1/s16-2016
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Text Segmentation Using Semantic Relatedness Graphs

Abstract: Segmenting text into semantically coherent fragments improves readability of text and facilitates tasks like text summarization and passage retrieval. In this paper, we present a novel unsupervised algorithm for linear text segmentation (TS) that exploits word embeddings and a measure of semantic relatedness of short texts to construct a semantic relatedness graph of the document. Semantically coherent segments are then derived from maximal cliques of the relatedness graph. The algorithm performs competitively… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
84
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 77 publications
(84 citation statements)
references
References 17 publications
0
84
0
Order By: Relevance
“…Alemi and Ginsparg (2015) and Naili et al (2017) studied how word embeddings can improve classical segmentation approaches. Glavaš et al (2016) utilized semantic relatedness of word embeddings by identifying cliques in a graph.…”
Section: Related Workmentioning
confidence: 99%
“…Alemi and Ginsparg (2015) and Naili et al (2017) studied how word embeddings can improve classical segmentation approaches. Glavaš et al (2016) utilized semantic relatedness of word embeddings by identifying cliques in a graph.…”
Section: Related Workmentioning
confidence: 99%
“…By treating source code 1 https://stackoverflow.com as a body of text akin to an NLP problem, we avoid any programming-language-specific challenges posed by other methods. Text segmentation has been researched more thoroughly than the source code analogue, with methods ranging from LDA [10], to semantic relatedness graphs [3], to deep learning approaches [1]. Of particular note is the use of bidirectional LSTMs to identify the breaks between segments of Wikipedia articles [8].…”
Section: Related Workmentioning
confidence: 99%
“…It calculates an error rate between 0 and 1 for predicting borders (0 indicates a perfect prediction), by penalizing near-misses less than other/complete misses or extra borders. Depending on the problem types and data sets used, text segmentation approaches report near-perfect windowDiff values of less than 0.01, while on the other side, the error rate exceeds values of 0.6 and higher under certain circumstances [6]. A more recent adaption of the WindowDiff metric is the WinPR metric [28].…”
Section: Style Breach Detectionmentioning
confidence: 99%