This paper concerns the development of a system for the recognition of a context or an environment based on acoustic information only. Our system uses mel-frequency cepstral coefficients and their derivatives as features, and continuous density hidden Markov models (HMM) as acoustic models. We evaluate different model topologies and training methods for HMMs and show that discriminative training can yield a 10% reduction in error rate compared to maximum-likelihood training. A listening test is made to study the human accuracy in the task and to obtain a baseline for the assessment of the performance of the system. Direct comparison to human performance indicates that the system performs somewhat worse than human subjects do in the recognition of 18 everyday contexts and almost comparably in recognizing six higher level categories.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.