Segmentation and Annotation of Audiovisual Recordings Based on Automated Speech Recognition

Repp, Stephan; Waitelonis, Jörg; Sack, Harald; Meinel, Christoph

doi:10.1007/978-3-540-77226-2_63

Cited by 20 publications

(7 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Due to the fact that the slides carried most of the information, Repp et al synchronized the imperfect transcript from the speech recognition engine automatically with the slide streams in post-processing [19]. Most approaches use out-of-the-box speech recognition engines which, for example, extract key phrases from spoken content [7].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Extraction of Semantic Descriptions from the Lecturer's Speech

Repp

Meinel

2009

2009 IEEE International Conference on Semantic Computing

Self Cite

View full text Add to dashboard Cite

The number of digital lecture video recordings has increased dramatically since recording technology became easier to use. The accessibility and ability to search within this large archive are limited and difficult. Manual annotation is time-consuming and therefore useless. A promising approach is based on using the audio layer of a lecture recording to obtain semantic information about the lecture's contents. The speech transcript and the words from the power point slides are sufficient to generate semantic metadata serialized in an OWL file. Two annotation methods are discussed, evaluated and compared to each other and to a perfectly annotated OWL file, as well as to an annotation based on a corrected transcript of the lecture.

show abstract

Section: Related Workmentioning

confidence: 99%

“…So each LO has a duration of approximately 1.5 minutes. The synchronization between the power point slides and the erroneous transcript in a post-processing process is explored in [19] for the cases where no log file exists with time-stamps for each slide transition.…”

Section: Second Test: Lo With the Slidesmentioning

confidence: 99%

Automatic Extraction of Semantic Descriptions from the Lecturer's Speech

Repp

Meinel

2009

2009 IEEE International Conference on Semantic Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Chen et al [3] attempted to automatically synchronize presentation slides with the speaker video. Repp et al [10] proposed the segmentation and annotation of audiovisual recordings based on automated speech recognition. Recently, Bhatt et al [1] and Che et al [2] attempted to automatically determine the temporal segmentation and annotation for lecture videos.…”

Section: Related Workmentioning

confidence: 99%

Atlas

Shah

Shaikh

et al. 2014

Proceedings of the 22nd ACM International Conference on Multimedia

View full text Add to dashboard Cite

The number of lecture videos available is increasing rapidly, though there is still insufficient accessibility and traceability of lecture video contents. Specifically, it is very desirable to enable people to navigate and access specific slides or topics within lecture videos. To this end, this paper presents the ATLAS system for the VideoLectures.NET challenge (MediaMixer, transLectures) to automatically perform the temporal segmentation and annotation of lecture videos. ATLAS has two main novelties: (i) a SVM hmm model is proposed to learn temporal transition cues and (ii) a fusion scheme is suggested to combine transition cues extracted from heterogeneous information of lecture videos. According to our initial experiments on videos provided by VideoLectures.NET, the proposed algorithm is able to segment and annotate knowledge structures based on fusing temporal transition cues and the evaluation results are very encouraging, which confirms the effectiveness of our ATLAS system.

show abstract

“…Due to the fact that the Pow-erPoint slides carried most of the information, Repp et al synchronized the imperfect transcript from the speech recognition engine automatically with the slide streams in postprocessing [14].…”

Section: Related Workmentioning

confidence: 99%

“…deleting stop-words and stemming of the words -the stems are stored in a database. This part of our system has already been described in [13,14].…”

Section: Identification Of Relevant Abstractmentioning

confidence: 99%

Question answering from lecture videos based on an automatic semantic annotation

Repp

Linckels

Meinel

2008

Proceedings of the 13th Annual Conference on Innovation and Technology in Computer Science Education

Self Cite

View full text Add to dashboard Cite

The number of digital lecture video recordings has increased dramatically. The accessibility, usability and the traceability of their content for students-use is limited. Therefore retrieval of audiovisual lecture recordings is a complex task. Speech recognition is applied to create a tentative and deficient transcription of the video recordings. The imperfect transcription is sufficient to generate semantic metadata serialized in an OWL file. A question answering system based on the automatically generated semantic annotations and a semantic search engine are presented. The annotation process is discussed, evaluated and compared to a perfectly annotated OWL file and, further, to a corrected transcript of the lecture.

show abstract

Segmentation and Annotation of Audiovisual Recordings Based on Automated Speech Recognition

Cited by 20 publications

References 11 publications

Automatic Extraction of Semantic Descriptions from the Lecturer's Speech

Automatic Extraction of Semantic Descriptions from the Lecturer's Speech

Atlas

Question answering from lecture videos based on an automatic semantic annotation

Contact Info

Product

Resources

About