We propose a novel framework for noise robust automatic speech recognition (ASR) based on cochlear implant-like spectrally reduced speech (SRS). Two experimental protocols (EPs) are proposed in order to clarify the advantage of using SRS for noise robust ASR. These two EPs assess the SRS in both the training and testing environments. Speech enhancement was used in one of two EPs to improve the quality of testing speech. In training, SRS is synthesized from original clean speech whereas in testing, SRS is synthesized directly from noisy speech or from enhanced speech signals. The synthesized SRS is recognized with the ASR systems trained on SRS signals, with the same synthesis parameters. Experiments show that the ASR results, in terms of word accuracy, calculated with ASR systems using SRS, are significantly improved compared to the baseline non-SRS ASR systems. We propose also a measure of the training and testing mismatch based on the Kullback-Leibler divergence. The numerical results show that using the SRS in ASR systems helps in reducing significantly the training and testing mismatch due to environmental noise. The training of the HMM-based ASR systems and the recognition tests were performed by using the HTK toolkit and the Aurora 2 speech database.
The measurements from multistatic radar systems are typically subjected to complicated data association, noise corruption, missed detection, and false alarms. Moreover, most of the current multistatic Doppler radar-based approaches in multitarget tracking are based on the assumption of known detection probability. This assumption can lead to biased or even complete corruption of estimation results. This paper proposes a method for tracking multiple targets from multistatic Doppler radar with unknown detection probability. A closed form labeled multitarget Bayes filter was used to track unknown and time-varying targets with unknown probability of detection in the presence of clutter, misdetection, and association uncertainty. The efficiency of the proposed algorithm was illustrated via numerical simulation examples.
End-to-end models yield impressive speech recognition results on clean datasets while having inferior performance on noisy datasets. To address this, we propose transfer learning from a clean dataset (WSJ) to a noisy dataset (CHiME-4) for connectionist temporal classification models. We argue that the clean classifier (the upper layers of a neural network trained on clean data) can force the feature extractor (the lower layers) to learn the underlying noise invariant patterns in the noisy dataset. While training on the noisy dataset, the clean classifier is either frozen or trained with a small learning rate. The feature extractor is trained with no learning rate re-scaling. The proposed method gives up to 15.5% relative character error rate (CER) reduction compared to models trained only on CHiME-4. Furthermore, we use the test sets of Aurora-4 to perform evaluation on unseen noisy conditions. Our method has significantly lower CERs (11.3% relative on average) on all 14 Aurora-4 test sets compared to the conventional transfer learning method (no learning rate rescale for any layer), indicating our method enables the model to learn noise invariant features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.