-Language Identification (LID) is one of the most popular areas of research in speech signal processing. Now a day's lots of approaches have been used to improve performance of LID system which includes Parallel Phone Recognition Language Modeling (PPRLM), Support Vector Machine (SVM) and general Gaussian Mixture Model (GMM) etc. The state-of-art LID system has been utilised lots of feature vectors like LPCC, MFCC, SDC and prosodic. Although fusion of prosodic features with MFCC features shows some improvement in the performance of the LID system. But still it is not sufficient. In this paper, a baseline system for the LID system in multilingual environments has been developed using GMM as a classifier and MFCC combined with Shifted-DeltaCepstral (SDC) as front end processing feature vectors. In this works, we used the Arunachali Language Speech Database (ALS-DB), a multilingual and multichannel speech corpus which was recently collected from the four local languages namely Adi, Apatani, Galo and Nyishi in Arunachal Pradesh including Hindi and English as secondary languages.The performance of the LID system has been improved by combing MFCC and SDC features than its individual performances. The minimum ERR rates for the features MFCC and SDC individually are 19.70% and 11.83% respectively while minimum ERR rate for the combined features both MFCC and SDC is 6.40%.Approximately 15.00% and 6.00% of performance of the LID system has been improved while using the combining features of MFCC with SDC over the baseline systems that using MFCC and SDC features in individual respectively.
In this paper we report the experiment carried out on recently collected speaker recognition database namely Arunachali Language Speech Database (ALS-DB)to make a comparative study on the performance of acoustic and prosodic features for speaker verification task.The speech database consists of speech data recorded from 200 speakers with Arunachali languages of NorthEast India as mother tongue. The collected database is evaluated using Gaussian mixture model-Universal Background Model (GMM-UBM) based speaker verification system. The acoustic feature considered in the present study is Mel-Frequency Cepstral Coefficients (MFCC) along with its derivatives.The performance of the system has been evaluated for both acoustic feature and prosodic feature individually as well as in combination.It has been observed that acoustic feature, when considered individually, provide better performance compared to prosodic features. However, if prosodic features are combined with acoustic feature, performance of the system outperforms both the systems where the features are considered individually. There is a nearly 5% improvement in recognition accuracy with respect to the system where acoustic features are considered individually and nearly 20% improvement with respect to the system where only prosodic features are considered.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.