We present the USC Speech and Vocal Tract Morphology MRI Database, a 17-speaker magnetic resonance imaging database for speech research. The database consists of real-time magnetic resonance images (rtMRI) of dynamic vocal tract shaping, denoised audio recorded simultaneously with rtMRI, and 3D volumetric MRI of vocal tract shapes during sustained speech sounds. We acquired 2D real-time MRI of vocal tract shaping during consonant-vowel-consonant sequences, vowelconsonant-vowel sequences, read passages, and spontaneous speech. We acquired 3D volumetric MRI of the full set of vowels and continuant consonants of American English. Each 3D volumetric MRI was acquired in one 7-second scan in which the participant sustained the sound. This is the first database to combine rtMRI of dynamic vocal tract shaping and 3D volumetric MRI of the entire vocal tract. The database provides a unique resource with which to examine the relationship between vocal tract morphology and vocal tract function.
In this work, we investigate the efficacy of Micro Electro-Mechanical System (MEMS) microphones, a newly developed technology of very compact sensors, for multichannel speech enhancement. Experiments are conducted on real speech data collected using a MEMS microphone array. First, the effectiveness of the array geometry for noise suppression is explored, using a new corpus containing speech recorded in diffuse and localized noise fields with a MEMS microphone array configured in linear and hexagonal array geometries.Our results indicate superior performance of the hexagonal geometry. Then, MEMS microphones are compared to Electret Condenser Microphones (ECMs), using the ATHENA database, which contains speech recorded in realistic smart home noise conditions with hexagonal-type arrays of both microphone types. MEMS microphones exhibit performance similar to ECMs. Good performance, versatility in placement, small size, and low cost, make MEMS microphones attractive for multichannel speech processing.
Abstract-In this paper, we examine three problems that rise in the modern, challenging area of far-field speech processing. The developed methods for each problem, namely (a) multichannel speech enhancement, (b) voice activity detection, and (c) speech recognition, are potentially applicable to a distant speech recognition system for voice-enabled smart home environments. The obtained results on real and simulated data, regarding the smart home speech applications, are quite promising due to the accomplished improvements made in the employed signal processing methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.