The loss in performance caused by mismatch between train and test material suggests a need for task specific acoustic models, especially for highly demanding tasks. However, since the training of these models is extremely expensive, general pur pose models are more attractive. In this paper we address the impact of mismatch in speaking style and task. We trained three sets of acoustic models on data from different tasks, involving both read and extemporaneous speech. The average utterance length in the training corpora varied between 10.5 and 1.2 words. The models were tested on matched as well on very dif ferent tasks. The results suggest that general purpose models trained from short utterances are to be preferred in most spoken dialog systems. However, these models might not perform ade quately in dictation tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.