This paper addresses the problem of detecting discussion scenes in instructional videos using statistical approaches. Sprcifically, given a series of speech segments separated from the audio tracks of educational videos, we first model the instructor using a Gaussian mixture model (GMM). then a four-state transition machine is designed to extract discussion scenes in real-time based on detected instructor-student speaker change points. Meanwhile, we keep updating the GMM model to accommodate the instructor's voice variation along time. Promising experimental results have been achieved on five educational (IBM MicroMBA program) videos, and very interesting instructionheaching pattems have been observed. The extracted scene information would facilitate the semantic indexing and structuralization of instructional video content.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.