2004
DOI: 10.1109/tsa.2004.828701
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Indexing of Lecture Presentations Using Unsupervised Learning of Presumed Discourse Markers

Abstract: A new method for automatic detection of section boundaries and extraction of key sentences from lecture audio archives is proposed. The method makes use of 'discourse markers' (DMs), which are characteristic expressions used in initial utterances of sections, together with pause and language model information. The DMs are derived in a totally unsupervised manner based on word statistics. An experimental evaluation using the Corpus of Spontaneous Japanese (CSJ) demonstrates that the proposed method provides bet… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2006
2006
2015
2015

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 12 publications
0
8
0
Order By: Relevance
“…Haubold and Kender [10,11] focus on multi-speaker presentation videos and develop an enhanced multimodal segmentation that leverages the audio stream to detect speaker changes. The use of audio for segmentation has also been studied in [14] for lectures and in more general purpose video retrieval systems [13,23]. Our work also employs visual frame differencing as a baseline, which we extend with both spatial analysis and speaker appearance modeling to reduce the number of non-slide images in the final set of keyframes.…”
Section: Related Workmentioning
confidence: 99%
“…Haubold and Kender [10,11] focus on multi-speaker presentation videos and develop an enhanced multimodal segmentation that leverages the audio stream to detect speaker changes. The use of audio for segmentation has also been studied in [14] for lectures and in more general purpose video retrieval systems [13,23]. Our work also employs visual frame differencing as a baseline, which we extend with both spatial analysis and speaker appearance modeling to reduce the number of non-slide images in the final set of keyframes.…”
Section: Related Workmentioning
confidence: 99%
“…The hypotheses of sentence boundaries are verified using the N-gram model. The method is also formulated as a statistical machine translation framework [22] and is referred to as enhanced SLM.…”
Section: Statistical Language Model (Slm)mentioning
confidence: 99%
“…More relevant information can be extracted from the audio. In [15], the analysis of prosody and silences as well as the detection of Discourse Markers, i.e. expressions that typically introduce a new argument, are used to segment and index university lectures.…”
Section: Previous Workmentioning
confidence: 99%