-Language Identification (LID) is one of the most popular areas of research in speech signal processing. Now a day's lots of approaches have been used to improve performance of LID system which includes Parallel Phone Recognition Language Modeling (PPRLM), Support Vector Machine (SVM) and general Gaussian Mixture Model (GMM) etc. The state-of-art LID system has been utilised lots of feature vectors like LPCC, MFCC, SDC and prosodic. Although fusion of prosodic features with MFCC features shows some improvement in the performance of the LID system. But still it is not sufficient. In this paper, a baseline system for the LID system in multilingual environments has been developed using GMM as a classifier and MFCC combined with Shifted-DeltaCepstral (SDC) as front end processing feature vectors. In this works, we used the Arunachali Language Speech Database (ALS-DB), a multilingual and multichannel speech corpus which was recently collected from the four local languages namely Adi, Apatani, Galo and Nyishi in Arunachal Pradesh including Hindi and English as secondary languages.The performance of the LID system has been improved by combing MFCC and SDC features than its individual performances. The minimum ERR rates for the features MFCC and SDC individually are 19.70% and 11.83% respectively while minimum ERR rate for the combined features both MFCC and SDC is 6.40%.Approximately 15.00% and 6.00% of performance of the LID system has been improved while using the combining features of MFCC with SDC over the baseline systems that using MFCC and SDC features in individual respectively.
In this paper we report the experiment carried out on recently collected speaker recognition database namely Arunachali Language Speech Database (ALS-DB)to make a comparative study on the performance of acoustic and prosodic features for speaker verification task.The speech database consists of speech data recorded from 200 speakers with Arunachali languages of NorthEast India as mother tongue. The collected database is evaluated using Gaussian mixture model-Universal Background Model (GMM-UBM) based speaker verification system. The acoustic feature considered in the present study is Mel-Frequency Cepstral Coefficients (MFCC) along with its derivatives.The performance of the system has been evaluated for both acoustic feature and prosodic feature individually as well as in combination.It has been observed that acoustic feature, when considered individually, provide better performance compared to prosodic features. However, if prosodic features are combined with acoustic feature, performance of the system outperforms both the systems where the features are considered individually. There is a nearly 5% improvement in recognition accuracy with respect to the system where acoustic features are considered individually and nearly 20% improvement with respect to the system where only prosodic features are considered.
In this paper a brief comparison studies on the performance of different speaker modeling techniques in robust and reliable speaker verification (SV) system has been discussed. In text-independent speaker verification, lots of states of art speaker modeling techniques have been developed in different scenarios to upgrade its performance. The performance of SV system is not only depended on the fusion of different feature vectors but also it is highly depended upon the fusion of various speaker modeling techniques. In this work, an automatic SV system has been developed using the Mel-Frequency Cepstral Coefficients (MFCC) combined with the Prosodic feature vectors. The baseline of the SV system has been trained with speaker modeling techniques separately and fusions namely Vector Quantization (VQ), Gaussian Mixture Model (GMM), GMM-Universal Background Model (GMM-UBM), Support Vector Machine (SVM) and Joint Factor Analysis (JFA) to analyze its performances. The results reported here, have been evaluated using the multilingual speech database, namely Arunachali Language Speech Database (ALS-DB). From the experimental point of view we observe that the best performance of SV system shows by JFA with GMM-UBM modeling technique with its EER value of 4.76% and MinDCF value of 0.0872. Comparing with other modeling techniques VQ shows its poor performance with its EER value of 11.08% and MinDCF value of 0.2010. SVM shows of approximately 2.8% improvement of verification rate with comparison to that of GMM-UBM. Here, finally, we conclude that the fusions of both generative and discriminative models highly improve the performance of SV system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.