Base-frequencies (F0) and spectral envelopes play an important role in speech separation and recognition by humans. Two experiments were conducted to study how trained networks for multi-speaker speech separation/recognition are affected by difference of F0 and spectral envelopes between source signals. The first experiment examined the effects of natural F0/envelope on the performance of speech separation. Results showed that when the two target signals differed in F0 by ±3 semitones or more or differed in the envelope by a scaling factor larger than 1.08 or less than 0.92, separation performance improved significantly. This is consistent with human listeners and is the first finding for deep learning-network (DNN) models. The second experiment tested the effect of F0/envelope difference on multi-speaker automatic speech recognition(ASR) system's performance. Results showed that multi-speaker recognition result also significantly rely on F0/envelope differences. The overall results indicated that the dependency of the existing automatic systems on monaural cues is similar to that of human, while automatic systems still perform inferior than human on same tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.