This paper describes the Multi-Genre Broadcast (MGB) Challenge at ASRU 2015, an evaluation focused on speech recognition, speaker diarization, and "lightly supervised" alignment of BBC TV recordings. The challenge training data covered the whole range of seven weeks BBC TV output across four channels, resulting in about 1,600 hours of broadcast audio. In addition several hundred million words of BBC subtitle text was provided for language modelling. A novel aspect of the evaluation was the exploration of speech recognition and speaker diarization in a longitudinal setting -i.e. recognition of several episodes of the same show, and speaker diarization across these episodes, linking speakers. The longitudinal tasks also offered the opportunity for systems to make use of supplied metadata including show title, genre tag, and date/time of transmission. This paper describes the task data and evaluation process used in the MGB challenge, and summarises the results obtained.
International audienceThis paper addresses the problem of unsupervised Bayesian hidden Markov chain restoration. When the hidden chain is stationary, the classical "Hidden Markov Chain" (HMC) model is quite efficient, and associated unsupervised Bayesian restoration methods using the "Expectation-Maximization" (EM) algorithm work well. When the hidden chain is non stationary, on the other hand, the unsupervised restoration results using the HMC model can be poor, due to a bad match between the real and estimated models. The novelty of this paper is to offer a more appropriate model for hidden nonstationary Markov chains, via the theory of evidence. Using recent results relating to Triplet Markov Chains (TMCs), we show, via simulations, that the classical restoration results can be improved by the use of the theory of evidence and Dempster-Shafer fusion. The latter improvement is performed in an entirely unsupervised way using an original parameter estimation method. Some application examples to unsupervised image segmentation are also provide
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.