2018
DOI: 10.1017/s1351324918000359
|View full text |Cite
|
Sign up to set email alerts
|

MUSED: A multimedia multi-document dataset for topic segmentation

Abstract: Research on topic segmentation has recently focused on segmenting documents by taking advantage of documents covering the same topics. In order to properly evaluate such approaches, a dataset of related documents is needed. However, existing datasets are limited in the number of related documents per domain. In addition, most of the available datasets do not consider documents from different media sources (PowerPoints, videos, etc.), which pose specific challenges to segmentation. We fill this gap with the MUl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 23 publications
(54 reference statements)
0
1
0
Order By: Relevance
“…Among the current mainstream CWS methods, the supervised character‐based tagging approach has yielded good word segmentation results (Mota et al., 2018; Üstün et al., 2021; Wei et al., 2021). However, this method requires a large number of well‐labeled corpora, and the word separation effect is generally better when the training corpus is tested with the same domain corpus.…”
Section: Introductionmentioning
confidence: 99%
“…Among the current mainstream CWS methods, the supervised character‐based tagging approach has yielded good word segmentation results (Mota et al., 2018; Üstün et al., 2021; Wei et al., 2021). However, this method requires a large number of well‐labeled corpora, and the word separation effect is generally better when the training corpus is tested with the same domain corpus.…”
Section: Introductionmentioning
confidence: 99%