This paper presents two new real-time approaches to segmentation of TV news shows into topics. The goal of this research work is the high precision retrieval of topics from TV news. For that purpose, the detection of correct topic boundaries is of great importance. We introduce a stochastic and a rule-based topic model based on HMMs. The former combines features from the visual as well as from the audio channel of the news show, whereas the latter uses the video channel only. They are compared to the detection of topics using only the audio channel, which is common for many other approaches. The paper contains the following innovations: 1) The detected segment boundaries correspond directly to topics and not to video or audio cuts, as most other segmentation methods. 2) An advanced stochastic topic model is introduced that uses audio as well as video features.3) The introduced HMM-based approaches both outperform the audio-based approach. One algorithm has a very good topic boundary detection rate, whereas the other minimizes the number of wrongly inserted boundaries without missing too many real boundaries.
To capitalize on the rapid development of Speech-to-Text (STT) technologies and the proliferation of open source machine learning toolkits, BBN has developed Sage, a new speech processing platform that integrates technologies from multiple sources, each of which has particular strengths. In this paper, we describe the design of Sage, which allows the easy interchange of STT components from different sources. We also describe our approach for fast prototyping with new machine learning toolkits, and a framework for sharing STT components across different applications. Finally, we report Sage's state-of-the-art performance on different STT tasks.
We report on recent improvements in our English/Iraqi Arabic speech-to-speech translation system. User interface improvements include a novel parallel approach to user confirmation which makes confirmation cost-free in terms of dialog duration. Automatic speech recognition improvements include the incorporation of state-of-the-art techniques in feature transformation and discriminative training. Machine translation improvements include a novel combination of multiple alignments derived from various pre-processing techniques, such as Arabic segmentation and English word compounding, higher order N-grams for target language model, and use of context in form of semantic classes and Part-of-Speech tags.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.