Text Segmentation by Cross Segment Attention

Łukasik, Michał; Dadachev, Boris; Simões, Gonçalo; Papineni, Kishore

doi:10.48550/arxiv.2004.14535

Cited by 4 publications

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These dynamic word representations, extracted from pre-trained models such as those presented in [3], [21]- [23], greatly outperformed their static predecessors in various NLP tasks [24]. In the field of topic segmentation, the introduction of dynamic word representa-tions has also produced results superior to those of previous approaches [25]- [27].…”

Section: Topic Segmentationmentioning

confidence: 99%

Auxiliary Loss for BERT-Based Paragraph Segmentation

ZHUO

Murata

2023

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Paragraph segmentation is a text segmentation task. Iikura et al. achieved excellent results on paragraph segmentation by introducing focal loss to Bidirectional Encoder Representations from Transformers.In this study, we investigated paragraph segmentation on Daily News and Novel datasets. Based on the approach proposed by Iikura et al., we used auxiliary loss to train the model to improve paragraph segmentation performance. Consequently, the average F1-score obtained by the approach of Iikura et al. was 0.6704 on the Daily News dataset, whereas that of our approach was 0.6801. Our approach thus improved the performance by approximately 1%. The performance improvement was also confirmed on the Novel dataset. Furthermore, the results of two-tailed paired t-tests indicated that there was a statistical significance between the performance of the two approaches.

show abstract

Section: Topic Segmentationmentioning

confidence: 99%

Auxiliary Loss for BERT-Based Paragraph Segmentation

ZHUO

Murata

2023

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…Finally, a segment boundary is defined as an SRT block that exhibits both the maximum cosine similarity and surpasses a predefined similarity threshold. Furthermore, [12] introduces a cross-segment attention mechanism designed to identify significant boundaries within text through the capture of inter-segment relationships. This method, which considers context and connections between segments, shows promising outcomes in enhancing the accuracy and effectiveness of text segmentation.…”

Section: Video Segmentation Based On Text Algorithmsmentioning

confidence: 99%

Semantic Segmentation of Educational Videos for Micro Learning Objects in Adaptive E-Learning

Halawa,

Gamalel-Din,

Nasr

2023

Journal of Al-Azhar University Engineering Sector

View full text Add to dashboard Cite

E-Learning is gaining prominence, especially in lifelong learning, primarily through lecture videos. However, these videos often encompass multiple topics or serve various instructional roles within a single subject. In adaptive e-Learning, the smaller and granular the units, the more versatile presentations and personalized lectures are composed. Such units are known as Micro Learning Objects (MLOs). Consequently, the necessity emerges to segment these lecture videos into multiple MLOs, each fulfilling a distinct instructional role in a lecture. This article presents an automatic model leveraging advanced language models to segment lecture videos semantically into Micro Learning Objects (MLOs). Additionally, a new wellsegmented dataset of educational videos (YT-EV) was introduced, in which the video is segmented according to a pre-defined timestamped agenda. The model is trained on general text datasets to understand LO segments and subsequently fine-tuned using transfer learning on video datasets to achieve better segmentation results. The experimental results showed an F1-score of value 0.657, which is considered promising and emphasizes the significance of text transcript-based video segmentation for enhancing adaptive e-Learning.

show abstract

“…SegBot [20] and [21] Sentence and document DisSim [22] Discourse sentence English Three BERT-style models [23] Discourse sentence and document…”

Section: Multilingualmentioning

confidence: 99%

“…Context-preserving approach [4] Simple sentence TopicDiff-LDA Latent Dirichlet Allocation the natural language understanding (NLU) approaches depending on artificial neural networks, particularly those adopted transformer-based models (e.g., [23]).…”

Section: Monolingualmentioning

confidence: 99%

Employing a Multilingual Transformer Model for Segmenting Unpunctuated Arabic Text

Alshanqiti¹,

Albouq²,

Alkhodre³

et al. 2022

Preprint

View full text Add to dashboard Cite

Long unpunctuated texts containing complex linguistic sentences are a stumbling block to processing any low-resource languages. Thus, approaches that attempt to segment lengthy texts with no proper punctuation into simple candidate sentences are a vitally important preprocessing task in many hard-to-solve NLP applications. In this paper, we propose (PDTS) a punctuation detection approach for segmenting Arabic text, built on top of a multilingual BERT-based model and some generic linguistic rules. Furthermore, we showcase how PDTS can be effectively employed as a text tokenizer for unpunctuated documents (i.e., mimicking the transcribed audio-to-text documents). Experimental findings across two evaluation protocols (involving an ablation study and a human-based judgment) demonstrate that PDTS is practically effective in both performance quality and computational cost.

show abstract

Text Segmentation by Cross Segment Attention

Cited by 4 publications

References 0 publications

Auxiliary Loss for BERT-Based Paragraph Segmentation

Auxiliary Loss for BERT-Based Paragraph Segmentation

Semantic Segmentation of Educational Videos for Micro Learning Objects in Adaptive E-Learning

Employing a Multilingual Transformer Model for Segmenting Unpunctuated Arabic Text

Contact Info

Product

Resources

About