Time Delay Neural Network (TDNN) is a well-performing structure for DNN-based speaker recognition systems. In this paper we introduce a novel structure Crossed-Time Delay Neural Network (CTDNN) to enhance the performance of current TDNN. Inspired by the multi-filters setting of convolution layer from convolution neural network, we set multiple time delay units each with different context size at the bottom layer and construct a multilayer parallel network. The proposed CTDNN gives significant improvements over original TDNN on both speaker verification and identification tasks. It outperforms in VoxCeleb1 dataset in verification experiment with a 2.6% absolute Equal Error Rate improvement. In few shots condition CTDNN reaches 90.4% identification accuracy, which doubles the identification accuracy of original TDNN. We also compare the proposed CTDNN with another new variant of TDNN, FTDNN, which shows that our model has a 36% absolute identification accuracy improvement under few shots condition and can better handle training of a larger batch in a shorter training time, which better utilize the calculation resources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.